최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0497511 (2009-07-02) |
등록번호 | US-9431006 (2016-08-30) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 1 인용 특허 : 584 |
Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameter
Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.
1. A machine implemented method to perform speech recognition, comprising: receiving first portions of an acoustic signal;determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal;determining a likelihood of a recovered second parameter s
1. A machine implemented method to perform speech recognition, comprising: receiving first portions of an acoustic signal;determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal;determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal having a coarser granularity than the first portions;determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; andoutputting the recovered word sequence. 2. A machine implemented method as in claim 1, further comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 3. A machine implemented method as in claim 1, wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 4. A machine implemented method as in claim 1, wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 5. A machine implemented method as in claim 1, further comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions;computing residuals of the first portions based on the cluster labels; andrepresenting the residuals of the first portions by one or more continuous parameters. 6. A machine implemented method as in claim 1, further comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 7. A machine implemented method as in claim 1, wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 8. A machine implemented method as in claim 1, wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 9. A machine implemented method as in claim 1, wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; andselecting the recovered first parameter sequence based on the matching. 10. A machine implemented method as in claim 1, wherein the determining the likelihood of the recovered second parameter sequence includesmapping the recovered first parameter sequence to the recovered second parameter sequence. 11. A non-transitory machine-readable medium storing executable program instructions which when executed by a data processing system causes the system to perform operations to recognize speech, comprising: receiving first portions of an acoustic signal;determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal;determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions;determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; andoutputting the recovered word sequence. 12. A non-transitory machine-readable medium as in claim 11, further comprising instructions that cause the system to perform operations comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 13. A non-transitory machine-readable medium as in claim 11, wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 14. A non-transitory machine-readable medium as in claim 11, wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 15. A non-transitory machine-readable medium as in claim 11, further comprising instructions that cause the system to perform operations comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions;computing residuals of the first portions based on the cluster labels; andrepresenting the residuals of the first portions by one or more continuous parameters. 16. A non-transitory machine-readable medium as in claim 11, further comprising instructions that cause the system to perform operations comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 17. A non-transitory machine-readable medium as in claim 11, wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 18. A non-transitory machine-readable medium as in claim 11, wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 19. A non-transitory machine-readable medium as in claim 11, wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; andselecting the recovered first parameter sequence based on the matching. 20. A non-transitory machine-readable medium as in claim 11, wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 21. A data processing system to perform speech recognition, comprising: a memory; anda processor coupled to the memory, the processor is configured to: receive first portions of an acoustic signal;determine a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal;determine a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions;determine a likelihood of a recovered word sequence based on the recovered second parameter sequence; andoutput the recovered word sequence. 22. A data processing system as in claim 21, wherein the processor is further configured to determine a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 23. A data processing system as in claim 21, wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 24. A data processing system as in claim 21, wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 25. A data processing system as in claim 21, wherein the processor is further configured to represent the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; to compute residuals of the first portions based on the cluster labels; and to represent the residuals of the first portions by one or more continuous parameters. 26. A data processing system as in claim 21, wherein the processor is further configured to determine a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 27. A data processing system as in claim 21, wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 28. A data processing system as in claim 21, wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 29. A data processing system as in claim 21, wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with an a first parameter sequence derived from training data; andselecting the recovered first parameter sequence based on the matching. 30. A data processing system as in claim 21, wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 31. A data processing system to perform speech recognition, comprising: means for receiving first portions of an acoustic signal; means for determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; means for determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; means for determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and means for outputting the recovered word sequence. 32. A non-transitory machine readable storage medium containing executable instructions which when executed cause a data processing system to perform a speech recognition method, the method comprising: receiving an acoustic signal;extracting features from a digitized representation of the acoustic signal;comparing at least some of the features to a first component of an acoustic model, the first component having a discrete parameter representation;comparing at least some of the features to a second component of the acoustic model, the second component having a continuous parameter representation which models residuals of speech signals;determining a recognized word from the comparing of at least some of the features to the first and the second components, wherein the discrete parameter representation and the continuous parameter representation are both used to map the features to at least one cluster label which is used to determine at least one phoneme. 33. The non-transitory machine readable medium as in claim 32 wherein the discrete parameter representation and the continuous parameter representation are separately and independently trained and are coupled after training.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.