최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0240458 (2008-09-29) |
등록번호 | US-8712776 (2014-04-29) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 18 인용 특허 : 599 |
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech i
Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
1. A method for selectively synthesizing speech based on a text string, comprising: at a device having one or more processors and memory: generating the text string from metadata associated with a media asset;parsing the text string and identifying one or more portions of the text string each provid
1. A method for selectively synthesizing speech based on a text string, comprising: at a device having one or more processors and memory: generating the text string from metadata associated with a media asset;parsing the text string and identifying one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset;substituting at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; andsynthesizing speech for provision with the media asset based on the text string after the substitution. 2. The method of claim 1 wherein synthesizing speech for provision with the media asset further comprises: determining a first set of phonemes in a native language of the text string;converting the first set of phonemes to a second set of phonemes in a target language; andgenerating speech data for provision with the media asset based on the second set of phonemes. 3. The method of claim 1, wherein respective information of different properties associated with or identifying the media asset include composer information and artist information. 4. The method of claim 1, further comprising: selecting from the text string a first subset of text for which to synthesize speech and a second subset of text for which not to synthesize speech based on one or more predefined rules specifying a predetermined set of information types for which to synthesize speech. 5. The method of claim 1, wherein the genre-dependent rule requires substitution of text providing artist information associated with the media asset with text providing composer information associated with the media asset when the respective genre associated with the media asset is classical music. 6. The method of claim 1, further comprising: adding text providing respective information of a third attribute associated with the media asset to the-text string before synthesizing speech based on the text string. 7. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors, cause the one or more processors to: generate a text string from metadata associated with a media asset;parse the text string and identify one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset;substitute at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; andsynthesize speech for provision with the media asset based on the text string after the substitution. 8. The computer-readable storage medium of claim 7 wherein synthesizing speech for provision with the media asset further comprises: determining a first set of phonemes in a native language of the text string;converting the first set of phonemes to a second set of phonemes in a target language; andgenerating speech data for provision with the media asset based on the second set of phonemes. 9. The computer-readable storage medium of claim 7, wherein respective information of different properties associated with or identifying the media asset include composer information and artist information. 10. The computer-readable storage medium of claim 7, wherein the instructions further cause the processors to: select from the text string a first subset of text for which to synthesize speech and a second subset of text for which not to synthesize speech based on one or more predefined rules specifying a predetermined set of information types for which to synthesize speech. 11. The computer-readable storage medium of claim 7, wherein the genre-dependent rule requires substitution of text providing artist information associated with the media assert with text providing composer information associated with the media asset when the respective genre associated with the media asset is classical music. 12. The computer-readable storage medium of claim 7, wherein the instructions further cause the processors to: add text providing respective information of a third attribute associated with the media asset to the text string before synthesizing speech based on the text string. 13. A system, comprising: one or more processors; andmemory, the memory storing one or more programs, the one or more programs comprising instructions, which when executed by the one or more processors, cause the one or more processors to: generate a text string from metadata associated with a media asset;parse the text string and identify one or more portions of the text string each providing information of a respective attribute associated with or identifying the media asset;substitute at least a first portion of the text string that provides respective information of a first attribute of the media asset with text providing respective information of a second attribute of the media asset different from the first attribute of the media asset, where the first attribute of the media asset and the second attribute of the media asset have been selected according to a genre-dependent rule and a respective genre associated with the media asset; andsynthesize speech for provision with the media asset based on the text string after the substitution. 14. The system of claim 13 wherein synthesizing speech for provision with the media asset based on the text string_further comprises: determining a first set of phonemes in a native language of the text string;converting the first set of phonemes to a second set of phonemes in a target language; andgenerating speech data for provision with the media asset based on the second set of phonemes. 15. The system of claim 13, wherein respective information of different properties associated with or identifying the media asset include composer information and artist information. 16. The system of claim 13, wherein the instructions further cause the processors to: select from the text string a first subset of text for which to synthesize speech and a second subset of text for which not to synthesize speech based on one or more predefined rules specifying a predetermined set of information types for which to synthesize speech. 17. The system of claim 13, wherein the genre-dependent rule requires substitution of text providing artist information associated with the media assert with text providing composer information associated with the media asset when the respective genre associated with the media asset is classical music. 18. The system of claim 13, further comprising: add text providing respective information of a third attribute associated with the media asset to the text string before synthesizing speech based on the text string.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.