[특허]Allocation of speech recognition tasks and combination of results thereof

Allocation of speech recognition tasks and combination of results thereof 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-015/00
출원번호	US-0888593 (2004-07-12)
등록번호	US-8589156 (2013-11-19)
발명자 / 주소	Burke, Paul M. Yacoub, Sherif
출원인 / 주소	Hewlett-Packard Development Company, L.P.
인용정보	피인용 횟수 : 4 인용 특허 : 24

초록 ▼

A system, method, computer-readable medium, and computer-implemented system for optimizing allocation of speech recognition tasks among multiple speech recognizers and combining recognizer results is described. An allocation determination is performed to allocate speech recognition among multiple speech recognizers using at least one of an accuracy-based allocation mechanism, a complexity-based allocation mechanism, and an availability-based allocation mechanism. The speech recognition is allocated among the speech recognizers based on the determined allocation. Recognizer results received from multiple speech recognizers in accordance with the speech recognition task allocation are combined.

대표청구항 ▼

1. A system for using multiple speech recognizers, the system comprising: an allocation determination mechanism to determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech, wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server,wherein said allocation determination mechanism is to use a threshold set on a vocabulary size to determine the complexity level of the speech,a task allocation mechanism to allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on a determination by the allocation determination mechanism; anda combination mechanism to receive results from the multiple speech recognizers and combine the results into a single result,wherein the results from each of the multiple speech recognizers include recognized words and a confidence score for each of the recognized words, andwherein, to combine the results, the combination mechanism is to compare the results from the multiple speech recognizers on a word-to-word basis and select a word from one of the multiple speech recognizers as a recognized word for the single result based on the confidence score of that word. 2. The system of claim 1, wherein the allocation determination mechanism is further to determine the allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers. 3. The system of claim 1, wherein the combination mechanism is further to use multiple confusion matrices, each corresponding to an audio environment type at the mobile device, to combine the results received from the multiple speech recognizers. 4. The system of claim 3, further comprising: an audio environment determination mechanism to determine an environment condition of the mobile device, and (ii) based on the determined environment condition, select one of multiple confusion matrices for the mobile-device-based speech recognizer for use by the combination mechanism in combining the results. 5. The system of claim 4, wherein said audio environment determination mechanism is to determine a signal to noise ratio of the speech. 6. The system of claim 1, wherein the threshold for complexity is further based on a number of times a user of the mobile device has to repeat what was spoken. 7. The system of claim 1, wherein the allocation determination mechanism is further to determine the allocation of the speech recognition tasks based on an accuracy requirement of a transaction attempted, and a noise level of the speech. 8. The system of claim 1, wherein each of recognized words in the results from the multiple speech recognizers further includes a weighting factor for the word, and wherein the combination mechanism is further to select a word from one of the multiple speech recognizers as a recognized word for the single result based on the weighting factor of that word. 9. The system of claim 8, wherein, if a word from the mobile-device-based speech recognizer matches a word from the server-based speech recognizer, the combination mechanism is to select that word as a recognized word for the single result, and if a word from the mobile-device-based speech recognizer does not match a corresponding word from the server-based speech recognizer, the combination mechanism is to combine the confidence score and weighting factor of that word to generate a comparison value, and select one of the words based on the comparison values of the words. 10. A method of using multiple speech recognizers, said method comprising: determining an allocation of speech recognition tasks among the multiple speech recognizers based on a complexity level of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size, and wherein the multiple speech recognizers include a mobile-device-based speech recognizer on a mobile device and a server-based speech recognizer on a server;allocating the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation;receiving results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words; andcombining the results to generate a single result, including comparing the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselecting a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word. 11. The method of claim 10, wherein determining the allocation of the speech recognition tasks is further based on at least one of a required accuracy of speech recognition output and an availability of the multiple speech recognizers. 12. The method of claim 10, further comprising: generating multiple confusion matrices based on different predetermined audio environment types for the mobile-device-based speech recognizer;determining an audio environment type at the mobile device; andselecting an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type. 13. The method of claim 10, further comprising: if the complexity of the speech is below the threshold, allocating the speech recognition tasks to the mobile-device-based speech recognizer, andif the results provided by the mobile-device-based speech recognizer are below a predetermined threshold, allocating the speech recognition tasks to the server-based speech recognizer for re-processing. 14. A non-transitory computer-readable medium, on which is stored machine executable instructions which when executed by a processor cause the processor to: determine an allocation of speech recognition tasks among multiple speech recognizers based on a complexity of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size and wherein the multiple speech recognizers include a mobile-device-based speech recognizer on a mobile device and a server-based speech recognizer on a server;allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation;receive results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words; andcombine the results to generate a single result, including compare the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselect a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word. 15. The non-transitory computer-readable medium of claim 14, wherein the machine readable instructions, when executed by the processor, are further to cause the processor to determine the allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers. 16. The non-transitory computer-readable medium of claim 14, further comprising instructions which, when executed by the processor, cause the processor to: generate, for the mobile-device-based speech recognizer, multiple confusion matrices based on different predetermined audio environment types; anddetermine an audio environment type at the mobile device and select an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type. 17. A computer-implemented system for allocating speech recognition tasks among multiple speech recognizers, the system comprising: a processor; anda memory coupled to the processor, the memory having stored therein instructions causing the processor to: determine an allocation of the speech recognition tasks among multiple speech recognizers based on a complexity of a speech with respect to a threshold, wherein the threshold is based on a vocabulary size, and wherein the multiple speech recognizers include a mobile-based speech recognizer on a mobile device and a server-based speech recognizer on a server;allocate the speech recognition tasks to both the mobile-device-based speech recognizer and the server-based speech recognizer based on the determined allocation, andreceive results from the mobile-device-based speech recognizer and the server-based speech recognizer, wherein the results from each of the speech recognizers include recognized words and a confidence score for each of the recognized words;combine the results to generate a single result, including compare the results from the mobile-device-based speech recognizer and the results from the server-based speech recognizer on a word-to-word basis, andselect a word from the mobile-device-based speech recognizer or a word from the server-based speech recognizer as a recognized word for the single result based on the confidence score of that word. 18. The system of claim 17, wherein the instructions, when executed, are further to cause the processor to determine an allocation of the speech recognition tasks based on a required accuracy of the results and an availability of the multiple speech recognizers. 19. The system of claim 17, further comprising instructions which, when executed by the processor, cause the processor to: generate, for the mobile-device-based speech recognizer, multiple confusion matrices based on different predetermined audio environment types; anddetermine an audio environment type at the mobile device and select an appropriate one among the multiple confusion matrices for use in combining the results, based on the determined audio environment type.

이 특허에 인용된 특허 (24)

Meador ; III Frank E. (Eldersburg MD) Casey Kathleen M. (Rockville MD) Curry James E. (Herndon VA) McAllister Alexander I. (Wheaton MD) Tressler Robert C. (Dunkirk MD) Hayden ; III James B. (Burke VA, Automated directory assistance system using word recognition and phoneme processing method.
상세보기
Goldberg Randy G., Confusion set based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique.
상세보기
Eric Thelen ; Stefan Besling, Distributed client-server speech recognition system.
상세보기
Cyr,James; Green,Channell; Hold,Martin; Kuhnen,Regina; MacGinite,Andrew, Distributed speech recognition system.
상세보기
Tomochika Ozaki JP; Tadashi Kuwabara JP; Michio Morioka JP; Yuichi Yagawa JP; Shigeki Hirasawa JP; Akio Yajima JP, Media-integrating system, terminal device, recording medium and broadcasting method for use in the media-integrating system.
상세보기
Monaco Peter C. ; Ehrlich Steven C. ; Ghosh Debajit ; Klenk Mark ; Sinai Julian ; Thirumalai Madhavan ; Gupta Sundeep, Method and apparatus for creating modifiable and combinable speech objects for acquiring information from a speaker in an interactive voice response system.
상세보기
Balasuriya, Senaka, Method and apparatus for multi-level distributed speech recognition.
상세보기
Anastasakos,Tasos; Balasuriya,Senaka; Van Wie,Michael, Method and apparatus for selective distributed speech recognition.
상세보기
Lyberg Bertil,SEX, Method and device for rating of speech quality by calculating time delays from onset of vowel sounds.
상세보기
McLean, James Gordon; Winarski, Daniel James; Wong, Tin-Lup, Method and system for collaborative speech recognition for small-area network.
상세보기
Wang Shay-Ping Thomas, Method and system for recognizing a boundary between contiguous sounds for use with a speech recognition system.
상세보기
Qin, Yong; Shen, Li Qin; Su, Hui; Tang, Donald T.; Wang, Qian Ying, Method for correcting error characters in results of speech recognition and speech recognition system using the same.
상세보기
Guilhaumon Benoit,FRX ; Miet Gilles,FRX, Method for training a speech recognition system and an apparatus for practising the method, in particular, a portable telephone apparatus.
상세보기
Raman Vijay R. ; Vysotsky George J., Methods and apparatus for generating and using garbage models for speaker dependent speech recognition purposes.
상세보기
Stork David G. (Stanford CA) Wolff Gregory J. (Mountain View CA), Neural network acoustic and visual speech recognition system training method and apparatus.
상세보기
Bennett,Steven M.; Anderson,Andrew V., Selecting one of multiple speech recognizers in a system based on performance predections resulting from experience.
상세보기
Eide,Ellen Marie; Gopinath,Ramesh Ambat; Kanevsky,Dimitri; Olsen,Peder Andreas, Speech and signal digitization by using recognition metrics to select from multiple techniques.
상세보기
Takagi,Keizaburo, Speech recognition method and apparatus with noise adaptive standard pattern.
상세보기
Fujii,Kenichi; Ikeda,Yuji; Ueda,Takaya; Ito,Fumiaki; Shimizu,Tomoyuki, Speech recognition system, speech recognition apparatus, and speech recognition method.
상세보기
Kushida,Akihiro; Kosaka,Tetsuo, Speech recognition system, speech recognition server, speech recognition client, their control method, and computer readable memory.
상세보기
Bijl David,GBX ; Hyde-Thomson Henry,GBX, Speech to text conversion.
상세보기
Brown Deborah W. ; Goldberg Randy G. ; Modi Piyush C. ; Rosinski Richard R. ; Sachs Richard M., System and method of recognizing letters and numbers by either speech or touch tone recognition utilizing constrained confusion matrices.
상세보기
Van Thong,Jean Manuel; Pusateri,Ernest, Systems and methods for combining subword recognition and whole word recognition of a spoken input.
상세보기
Juang Biing-Hwang ; Lee Chin-Hui ; Rose Richard Cameron, Systems, methods and articles of manufacture for improving recognition confidence in hypothesized keywords.
상세보기

이 특허를 인용한 특허 (4)

Dai, Haisheng; Wang, Qianying; Wang, Hao; Fan, Lifeng; Wang, Tianshu; Li, Xiangyang, Information processing method with voice recognition.
상세보기
Choi, Chan-hee; Park, Kyung-mi; Hwang, Kwang-il, Interactive system, display apparatus, and controlling method thereof.
상세보기
Lim, Kyu Hyung, Speech recognition solution based on comparison of multiple different speech inputs.
상세보기
Ljolje, Andrej; Gilbert, Mazin, System and method for supplemental speech recognition by identified idle resources.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Allocation of speech recognition tasks and combination of results thereof 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (24)

이 특허를 인용한 특허 (4)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Allocation of speech recognition tasks and combination of results thereof 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (24)

이 특허를 인용한 특허 (4)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트