최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0986515 (2007-11-20) |
등록번호 | US-8620662 (2013-12-31) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 39 인용 특허 : 396 |
Methods and apparatuses to perform context-aware unit selection for natural language processing are described. Streams of information associated with input units are received. The streams of information are analyzed in a context associated with first candidate units to determine a first set of weigh
Methods and apparatuses to perform context-aware unit selection for natural language processing are described. Streams of information associated with input units are received. The streams of information are analyzed in a context associated with first candidate units to determine a first set of weights of the streams of information. A first candidate unit is selected from the first candidate units based on the first set of weights of the streams of information. The streams of information are analyzed in the context associated with second candidate units to determine a second set of weights of the streams of information. A second candidate unit is selected from second candidate units to concatenate with the first candidate unit based on the second set of weights of the streams of information.
1. A machine-implemented method of text-to-speech generation, comprising: at a device comprising one or more processors and memory: receiving a text input to be converted to speech, the text input including a sequence of text input units; andfor each text input unit of the sequence of text input uni
1. A machine-implemented method of text-to-speech generation, comprising: at a device comprising one or more processors and memory: receiving a text input to be converted to speech, the text input including a sequence of text input units; andfor each text input unit of the sequence of text input units: selecting, from a pool of pre-recorded segments of speech, a respective plurality of candidate speech units for the text input unit, wherein the respective plurality of candidate speech units differ from one another in regard to one or more of a plurality of characteristics;for each of the plurality of characteristics, determining a respective degree of variation present among the respective plurality of candidate speech units selected from the pool of pre-recorded segments of speech;determining a respective weight set for the text input unit, the respective weight set including a respective weight for each of the plurality of characteristics based on relative magnitudes of the respective degrees of variations that are present among the candidate speech units for the plurality of characteristics; andbased on the respective weight set for the text input unit, selecting a respective one of the respective plurality of candidate speech units to synthesize a respective speech output corresponding to the text input unit. 2. The machine-implemented method of claim 1, further comprising: concatenating the respective speech outputs selected for the sequence of text input units as a respective speech output corresponding to the text input. 3. The machine-implemented method of claim 1, wherein determining the respective weight set for the input text unit further comprises: weighting a first characteristic higher than a second characteristic in the respective weight set for the plurality of characteristics if the first characteristic provides a higher discrimination between the plurality of candidate speech units for the first text input unit. 4. The machine-implemented method of claim 1, wherein determining the respective weight set for the input text unit further comprises: performing a constrained quadratic optimization to find the respective weight set for the first input text unit, wherein the constrained quadratic optimization maximizes a respective conversion cost associated with each of the respective plurality of candidate speech units for the text input unit. 5. The machine-implemented method of claim 4, wherein the selected one of the respective plurality of candidate speech units is a speech unit associated a minimum conversion cost among the maximized respective conversion costs of the plurality of candidate speech units. 6. The machine-implemented method of claim 1, wherein the plurality of characteristics include two or more of pitch, duration, position, accent, spectral quality, and part-of-speech. 7. The machine-implemented method of claim 1, wherein selecting one of the plurality of candidate speech units as a speech output is further based on respective values of the plurality of characteristics belonging to each of the respective plurality of candidate speech units. 8. A non-transitory computer-readable medium having instructions stored thereon, the instruction, when executed by one or more processors, cause the processors to perform operations comprising: receiving a text input to be converted to speech, the text input including a sequence of text input units; andfor each text input unit of the sequence of text input units: selecting, from a pool of pre-recorded segments of speech, a respective plurality of candidate speech units for the text input unit, wherein the respective plurality of candidate speech units differ from one another in regard to one or more of a plurality of characteristics;for each of the plurality of characteristics, determining a respective degree of variation present among the respective plurality of candidate speech units selected from the pool of pre-recorded segments of speech;determining a respective weight set for the text input unit, the respective weight set including a respective weight for each of the plurality of characteristics based on relative magnitudes of the respective degrees of variations that are present among the candidate speech units for the plurality of characteristics; andbased on the respective weight set for the text input unit, selecting a respective one of the respective plurality of candidate speech units to synthesize a respective speech output corresponding to the text input unit. 9. The computer-readable medium of claim 8, wherein the operations further comprise: concatenating the respective speech outputs selected for the sequence of text input units as a respective speech output corresponding to the text input. 10. The computer-readable medium of claim 8, wherein determining the respective weight set for the input text unit further comprises: weighting a first characteristic higher than a second characteristic in the respective weight set for the plurality of characteristics if the first characteristic provides a higher discrimination between the plurality of candidate speech units for the text input unit. 11. The computer-readable medium of claim 8, wherein determining the respective weight set for the input text unit further comprises: performing a constrained quadratic optimization to find the respective weight set for the input text unit, wherein the constrained quadratic optimization maximizes a respective final conversion cost associated with each of the respective plurality of candidate speech units for the text input unit. 12. The computer-readable medium of claim 11, wherein the selected one of the respective plurality of candidate speech units is a speech unit associated a minimum conversion cost among the maximized respective conversion costs of the plurality of candidate speech units. 13. The computer-readable medium of claim 8, wherein the plurality of characteristics include two or more of pitch, duration, position, accent, spectral quality, and part-of-speech. 14. The computer-readable medium of claim 8, selecting one of the plurality of candidate speech units as a speech output is further based on respective values of the plurality of characteristics belonging to each of the respective plurality of candidate speech units. 15. A system, comprising: one or more processors; andmemory having instructions stored thereon, the instructions, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving a text input to be converted to speech, the text input including a sequence of text input units; andfor each text input unit of the sequence of text input units: selecting, from a pool of pre-recorded segments of speech, a respective plurality of candidate speech units for the text input unit, wherein the respective plurality of candidate speech units differ from one another in regard to one or more of a plurality of characteristics;for each of the plurality of characteristics, determining a respective degree of variation present among the respective plurality of candidate speech units selected from the pool of pre-recorded segments of speech;determining a respective weight set for the text input unit, the respective weight set including a respective weight for each of the plurality of characteristics based on relative magnitudes of the respective degrees of variations that are present among the candidate speech units for the plurality of characteristics; andbased on the respective weight set for the text input unit, selecting a respective one of the respective plurality of candidate speech units to synthesize a respective speech output corresponding to the text input unit. 16. The system of claim 15, wherein the operations further comprise: concatenating the respective speech outputs selected for the sequence of text input units as a respective speech output corresponding to the text input. 17. The system of claim 15, wherein determining the respective weight set for the input text unit further comprises: weighting a first characteristic higher than a second characteristic in the respective weight set for the plurality of characteristics if the first characteristic provides a higher discrimination between the plurality of candidate speech units for the first text input unit. 18. The system of claim 15, wherein determining the respective weight set for the input text unit further comprises: performing a constrained quadratic optimization to find the respective weight set for the first input text unit, wherein the constrained quadratic optimization maximizes a respective conversion cost associated with each of the respective plurality of candidate speech units for the first text input unit. 19. The system of claim 18, wherein the selected one of the respective plurality of candidate speech units is a speech unit associated a minimum conversion cost among the maximized respective conversion costs of the plurality of candidate speech units. 20. The system of claim 15, wherein the plurality of characteristics include two or more of pitch, duration, position, accent, spectral quality, and part-of-speech. 21. The system of claim 15, wherein selecting one of the plurality of candidate speech units as a speech output is further based on respective values of the plurality of characteristic belonging to each of the respective plurality of candidate speech units.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.