IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0888593
(2004-07-12)
|
등록번호 |
US-8589156
(2013-11-19)
|
발명자
/ 주소 |
- Burke, Paul M.
- Yacoub, Sherif
|
출원인 / 주소 |
- Hewlett-Packard Development Company, L.P.
|
인용정보 |
피인용 횟수 :
4 인용 특허 :
24 |
초록
▼
A system, method, computer-readable medium, and computer-implemented system for optimizing allocation of speech recognition tasks among multiple speech recognizers and combining recognizer results is described. An allocation determination is performed to allocate speech recognition among multiple sp
A system, method, computer-readable medium, and computer-implemented system for optimizing allocation of speech recognition tasks among multiple speech recognizers and combining recognizer results is described. An allocation determination is performed to allocate speech recognition among multiple speech recognizers using at least one of an accuracy-based allocation mechanism, a complexity-based allocation mechanism, and an availability-based allocation mechanism. The speech recognition is allocated among the speech recognizers based on the determined allocation. Recognizer results received from multiple speech recognizers in accordance with the speech recognition task allocation are combined.
대표청구항
▼
1. A system for using multiple speech recognizers, the system comprising: an allocation determination mechanism to determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech, wherein the multiple speech recognizers include a mobile-based s
1. A system for using multiple speech recognizers, the system comprising: an allocation determination mechanism to determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech, wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server,wherein said allocation determination mechanism is to use a threshold set on a vocabulary size to determine the complexity level of the speech,a task allocation mechanism to allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on a determination by the allocation determination mechanism; anda combination mechanism to receive results from the multiple speech recognizers and combine the results into a single result,wherein the results from each of the multiple speech recognizers include recognized words and a confidence score for each of the recognized words, andwherein, to combine the results, the combination mechanism is to compare the results from the multiple speech recognizers on a word-to-word basis and select a word from one of the multiple speech recognizers as a recognized word for the single result based on the confidence score of that word. 2. The system of claim 1, wherein the allocation determination mechanism is further to determine the allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers. 3. The system of claim 1, wherein the combination mechanism is further to use multiple confusion matrices, each corresponding to an audio environment type at the mobile device, to combine the results received from the multiple speech recognizers. 4. The system of claim 3, further comprising: an audio environment determination mechanism to determine an environment condition of the mobile device, and (ii) based on the determined environment condition, select one of multiple confusion matrices for the mobile-device-based speech recognizer for use by the combination mechanism in combining the results. 5. The system of claim 4, wherein said audio environment determination mechanism is to determine a signal to noise ratio of the speech. 6. The system of claim 1, wherein the threshold for complexity is further based on a number of times a user of the mobile device has to repeat what was spoken. 7. The system of claim 1, wherein the allocation determination mechanism is further to determine the allocation of the speech recognition tasks based on an accuracy requirement of a transaction attempted, and a noise level of the speech. 8. The system of claim 1, wherein each of recognized words in the results from the multiple speech recognizers further includes a weighting factor for the word, and wherein the combination mechanism is further to select a word from one of the multiple speech recognizers as a recognized word for the single result based on the weighting factor of that word. 9. The system of claim 8, wherein, if a word from the mobile-device-based speech recognizer matches a word from the server-based speech recognizer, the combination mechanism is to select that word as a recognized word for the single result, and if a word from the mobile-device-based speech recognizer does not match a corresponding word from the server-based speech recognizer, the combination mechanism is to combine the confidence score and weighting factor of that word to generate a comparison value, and select one of the words based on the comparison values of the words. 10. A method of using multiple speech recognizers, said method comprising: determining an allocation of speech recognition tasks among the multiple speech recognizers based on a complexity level of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size, and wherein the multiple speech recognizers include a mobile-device-based speech recognizer on a mobile device and a server-based speech recognizer on a server;allocating the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation;receiving results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words; andcombining the results to generate a single result, including comparing the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselecting a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word. 11. The method of claim 10, wherein determining the allocation of the speech recognition tasks is further based on at least one of a required accuracy of speech recognition output and an availability of the multiple speech recognizers. 12. The method of claim 10, further comprising: generating multiple confusion matrices based on different predetermined audio environment types for the mobile-device-based speech recognizer;determining an audio environment type at the mobile device; andselecting an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type. 13. The method of claim 10, further comprising: if the complexity of the speech is below the threshold, allocating the speech recognition tasks to the mobile-device-based speech recognizer, andif the results provided by the mobile-device-based speech recognizer are below a predetermined threshold, allocating the speech recognition tasks to the server-based speech recognizer for re-processing. 14. A non-transitory computer-readable medium, on which is stored machine executable instructions which when executed by a processor cause the processor to: determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size and wherein the multiple speech recognizers include a mobile-device-based speech recognizer on a mobile device and a server-based speech recognizer on a server;allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation;receive results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words; andcombine the results to generate a single result, including compare the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselect a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word. 15. The non-transitory computer-readable medium of claim 14, wherein the machine readable instructions, when executed by the processor, are further to cause the processor to determine the allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers. 16. The non-transitory computer-readable medium of claim 14, further comprising instructions which, when executed by the processor, cause the processor to: generate, for the mobile-device-based speech recognizer, multiple confusion matrices based on different predetermined audio environment types; anddetermine an audio environment type at the mobile device and select an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type. 17. A computer-implemented system for allocating speech recognition tasks among multiple speech recognizers, the system comprising: a processor; anda memory coupled to the processor, the memory having stored therein instructions causing the processor to: determine an allocation of the speech recognition tasks among multiple speech recognizers based on a complexity of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size, and wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server;allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation, andreceive results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words;combine the results to generate a single result, including compare the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselect a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word. 18. The system of claim 17, wherein the instructions, when executed, are further to cause the processor to determine an allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers. 19. The system of claim 17, further comprising instructions which, when executed by the processor, cause the processor to: generate, for the mobile-device-based speech recognizer, multiple confusion matrices based on different predetermined audio environment types; anddetermine an audio environment type at the mobile device and select an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.