[특허]Speech synthesis apparatus

Speech synthesis apparatus 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-013/06 G10L-013/00 G10L-021/00
출원번호	UP-0226331 (2005-09-15)
등록번호	US-7526430 (2009-07-01)
우선권정보	JP-2004-167666(2004-06-04)
발명자 / 주소	Kato, Yumiko Kamai, Takahiro
출원인 / 주소	Panasonic Corporation
대리인 / 주소	Wenderoth, Lind & Ponack, L.L.P.
인용정보	피인용 횟수 : 1 인용 특허 : 7

초록 ▼

A speech synthesis apparatus, which can embed unchangeable additional information into synthesized speech without causing a deterioration of speech quality and restriction by bands, includes a language processing unit which generates synthesized speech generation information necessary for generating synthesized speech in accordance with a language string, a prosody generating unit which generates prosody information of speech based on the synthesized speech generation information, and a waveform generating unit which synthesizes speech based on the prosody information, in which the prosody generating unit embed code information as watermark information in the prosody information of a segment having a predetermined time duration within a phoneme length including a phoneme boundary.

대표청구항 ▼

What is claimed is: 1. A speech synthesis apparatus which synthesizes speech, said apparatus comprising: a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein said prosody generation unit is for: specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. 2. The speech synthesis apparatus according to claim 1, wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds. 3. The speech synthesis apparatus according to claim 1, further comprising an encoding unit for encoding additional information, wherein said encoding unit is for encoding information for associating the micro-prosody pattern stored in said storage unit with the additional information, and wherein said prosody generation unit is for selecting from the storage unit, based on the encoded information, the micro-prosody pattern associated with the additional information, and embedding the selected micro-prosody pattern into the specified time position including the phoneme boundary. 4. The speech synthesis apparatus according to claim 3, wherein said encoding unit is further for generating key information which corresponds to the encoded information for decoding the additional information. 5. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising: a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and being used to identify the inputted speech as synthesized speech; and an identifying unit for: extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as watermark information, the fundamental frequency of the speech calculated by said fundamental frequency calculation unit; matching a pattern of the extracted fundamental frequency with the micro-prosody pattern stored in said storage unit; and identifying whether or not the inputted speech is synthesized speech. 6. An additional information reading apparatus which decodes additional information embedded in inputted speech, said apparatus comprising: a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern associated with the additional information is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary; and an additional information extracting unit for: extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as water mark information, a micro-prosody pattern from the speech fundamental frequency calculated by said fundamental frequency calculating unit; comparing the extracted micro-prosody pattern with the micro-prosody pattern associated with the additional information; and extracting predetermined additional information included in the extracted micro-prosody pattern. 7. The additional information reading apparatus according to claim 6, wherein the additional information is encoded, and said additional information reading apparatus further comprises a decoding unit for decoding the encoded additional information using key information for decoding. 8. A speech synthesis method of synthesizing speech, comprising generating prosody information of the speech based on synthesized speech generation information, wherein said generating includes: specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. 9. The speech synthesis method according to claim 8, wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds. 10. A program embodied on a computer readable recording medium, for making a computer function as a speech synthesis apparatus, said program making the computer function as the following: a prosody generating unit for generating prosody information of speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein the prosody generating unit is for: specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. 11. The program embodied on a computer readable recording medium, according to claim 10, wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds. 12. A computer readable recording medium on which a program for making a computer function as a speech synthesis apparatus is recorded, wherein said program makes a computer function as the following: a prosody generating unit for generating prosody information of speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein the prosody generating unit is for: specifying a time position including a phoneme boundary in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including the phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech. 13. The computer readable recording medium according to claim 12, wherein a duration for embedding the extracted micro-prosody pattern is a duration in a range from 10 milliseconds to 50 milliseconds. 14. The speech synthesis apparatus according to claim 1, wherein said prosody generating unit is for identifying, as the time position including the phoneme boundary in the speech to be synthesized, a portion of at least one vowel of: a vowel which follows immediately after silence; a vowel which follows immediately after a consonant other than a semivowel; a vowel which immediately precedes silence; and a vowel which immediately precedes a consonant other than a semivowel. 15. The speech synthesis apparatus according to claim 1, wherein said prosody generating unit is for identifying, as the time position including the phoneme boundary in the speech to be synthesized, at least one of: a portion, including a starting point of a phoneme, of a vowel which follows immediately after silence; a portion,-including the starting point of the phoneme, of a vowel which follows immediately after a consonant other than a semivowel; a portion, including an ending point of the phoneme, of a vowel which immediately precedes silence; and a portion, including the ending point of the phoneme, of a vowel which immediately precedes a consonant other than a semivowel. 16. A speech synthesis apparatus which synthesizes speech, said apparatus comprising: a prosody generating unit for generating prosody information of the speech based on synthesized speech generation information; and a synthesis unit for synthesizing the speech based on the prosody information, wherein said prosody generation unit is for: specifying a time position in the speech to be synthesized into which a micro-prosody pattern is to be embedded, based on the synthesized speech generation information; extracting a micro-prosody pattern from a storage unit, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary; and embedding the extracted micro-prosody pattern into the specified time position as watermark information, the embedded micro-prosody pattern indicating that the speech is synthesized speech, and the embedded micro-prosody pattern being used to identify a manufacturer of said speech synthesis apparatus. 17. A synthesis speech identifying apparatus which identifies whether or not inputted speech is synthesized speech, said apparatus comprising: a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and the micro-prosody pattern being used to identify the inputted speech as synthesized speech and to identify a manufacturer of said speech synthesis apparatus that has generated the synthesized speech; and an identifying unit for: extracting, in a segment having a duration within which a micro-prosody pattern of the inputted speech exists as watermark information, the fundamental frequency of the speech calculated by said fundamental frequency calculation unit; matching a pattern of the extracted fundamental frequency with the micro-prosody pattern stored in said storage unit; and identifying whether or not the inputted speech is synthesized speech and, in the case where the inputted speech is synthesized speech, identify a manufacturer of said speech synthesis apparatus that has generated the synthesized speech. 18. An additional information reading apparatus which decodes additional information embedded in inputted speech, said apparatus comprising: a fundamental frequency calculating unit for calculating a speech fundamental frequency of the inputted speech on a per frame basis, each frame having a predetermined duration; a storage unit in which a micro-prosody pattern associated with the additional information is stored, the micro-prosody pattern being a pattern of a fine time structure of prosody including a phoneme boundary, and the micro-prosody pattern being used to identify a manufacturer of said speech synthesis apparatus; and an additional information extracting unit for: extracting, in a segment having a duration including a phoneme boundary within which a micro-prosody pattern of the inputted speech exists as watermark information, a micro-prosody pattern from the speech fundamental frequency calculated by said fundamental frequency calculating unit; comparing the extracted micro-prosody pattern with the micro-prosody pattern associated with the additional information; extracting predetermined additional information included in the extracted micro-prosody pattern; and identifying a manufacturer of said speech synthesis apparatus that has generated the synthesized speech.

이 특허에 인용된 특허 (7)

Steven M. Hoffberg ; Linda I. Hoffberg-Borghesani, Adaptive pattern recognition based control system and method.
상세보기
Steven M. Hoffberg ; Linda I. Hoffberg-Borghesani, Ergonomic man-machine interface incorporating adaptive pattern recognition based control system.
상세보기
Hoffberg, Steven M., Intelligent electronic appliance system and method.
상세보기
Mizuno Osamu,JPX ; Nakajima Shinya,JPX, Method and apparatus for editing/creating synthetic speech message and recording medium with the method recorded thereon.
상세보기
Erdem,Caglayan; Holzapfel,Martin, Method for detecting the time sequences of a fundamental frequency of an audio response unit to be synthesized.
상세보기
Coorman, Geert; Deprez, Filip; De Bock, Mario; Fackrell, Justin; Leys, Steven; Rutten, Peter; DeMoortel, Jan; Schenk, Andre; Coile, Bert Van, Speech synthesis using concatenation of speech waveforms.
상세보기
Kirovski, Darko; Malvar, Henrique, Watermark detection via cardinality-scaled correlation.
상세보기

이 특허를 인용한 특허 (1)

Nakamura, Masanobu; Morita, Masahiro, Digital watermark embedding device, digital watermark embedding method, and computer-readable recording medium.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Speech synthesis apparatus 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (7)

이 특허를 인용한 특허 (1)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Speech synthesis apparatus 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (7)

이 특허를 인용한 특허 (1)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트