최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0247912 (2011-09-28) |
등록번호 | US-8762156 (2014-06-24) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 65 인용 특허 : 523 |
A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command
A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.
1. A machine implemented method comprising: receiving a speech input from a user of a data processing system;determining a context, of the data processing system, when the speech input was received;recognizing text in the speech input through a speech recognition system that includes an acoustic mod
1. A machine implemented method comprising: receiving a speech input from a user of a data processing system;determining a context, of the data processing system, when the speech input was received;recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output;storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output;processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and each of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output;merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription. 2. The method as in claim 1 wherein the context includes a history of prior user inputs and wherein the one or more databases comprises a contacts database which stores at least one of names, addresses and phone numbers. 3. The method as in claim 2 wherein the context includes a conversation history and wherein the one or more databases comprises a media database which stores at least one of song, titles, and artists and wherein an interpreter in the set of interpreters uses at least two consecutive words when evaluating a possible match. 4. The method as in claim 1 wherein a first interpreter, in the set of interpreters, uses a first algorithm to determine whether to repair a word and wherein a second interpreter, in the set of interpreters, uses a second algorithm to determine whether to repair a word, the first algorithm being different than the second algorithm. 5. The method as in claim 1 wherein a first interpreter, in the set of interpreters, uses a first algorithm to search the one or more databases and a second interpreter, in the set of interpreters, uses a second algorithm to search the one or more databases, and wherein the first algorithm and the second algorithm are different. 6. The method as in claim 1 wherein the interpreters in the set of interpreters do not attempt to repair the command. 7. The method as in claim 1 wherein the merging merges only non-overlapping results from the set of interpreters, and overlapping results from the set of interpreters are ranked in a ranked set and one result in the ranked set is selected and merged into the final interpreted speech transcription. 8. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising: receiving a speech input from a user of a data processing system;determining a context, of the data processing system, when the speech input was received;recognizing text in the speech input through a speech recognition system that includes an acoustic model and a language model, the recognizing of text producing a first text output;storing the first text output as a parsed data structure having a plurality of tokens each of which represents a word in the first text output;processing each of the tokens with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and at least one of the tokens, each of the interpreters determining from any matches and from the context whether it can repair a token in the first text output, wherein each interpreter is designed to repair an error of a specific type in the first text output;merging selected results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription. 9. The medium as in claim 8 wherein the context includes a history of prior user inputs and wherein the one or more databases comprises a contacts database which stores at least one of names, addresses and phone numbers. 10. The medium as in claim 9 wherein the context includes a conversation history and wherein the one or more databases comprises a media database which stores at least one of song, titles, and artists and wherein an interpreter in the set of interpreters uses at least two consecutive words when evaluating a possible match. 11. The medium as in claim 8 wherein a first interpreter, in the set of interpreters, uses a first algorithm to determine whether to repair a word and wherein a second interpreter, in the set of interpreters, uses a second algorithm to determine whether to repair a word, the first algorithm being different than the second algorithm. 12. The medium as in claim 8 wherein a first interpreter, in the set of interpreters, uses a first algorithm to search the one or more databases and a second interpreter, in the set of interpreters, uses a second algorithm to search the one or more databases, and wherein the first algorithm and the second algorithm are different. 13. The medium as in claim 8 wherein the interpreters in the set of interpreters do not attempt to repair the command. 14. The medium as in claim 8 wherein the merging merges only non-overlapping results from the set of interpreters, and overlapping results from the set of interpreters are ranked in a ranked set and one result in the ranked set is selected and merged into the final interpreted speech transcription. 15. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising: receiving a speech input from a user of a data processing system;recognizing text in the speech input through a speech recognition system that includes an acoustic model and an optional language model, the recognizing of text producing a first text output;storing the first text output as a parsed data structure having a plurality of words in the first text output;processing at least one of the words with a set of interpreters, each interpreter in the set being designed to search one or more databases to search for matches between one or more items in the databases and the at least one of the words, each of the interpreters determining from any matches whether it can repair a word in the first text output, wherein each interpreter is designed to repair an error of a specific field in the one or more databases;merging repaired results from the set of interpreters to produce a final interpreted speech transcription which represents a repaired version of the first text output;providing the final interpreted speech transcription to a selected application, in a set of applications, based on a command in the final interpreted speech transcription, the selected application to execute the command in the final interpreted speech transcription. 16. The medium as in claim 15, wherein the method further comprises: determining a context, of the data processing system, when the speech input was received, wherein the context includes a history of prior user inputs and wherein the one or more databases comprises a contacts database which stores at least one of names, addresses and phone numbers;and wherein different interpreters, in the set of interpreters, use different algorithms to determine whether to repair a word in the first text output, and wherein each interpreter determines, through a score, whether it can repair a word in the first text output. 17. A data processing system comprising: a processor capable of recognizing text in a speech input and producing a first text output;a context determining system which determines a context of the data processing system when the speech input is received;a microphone coupled to the processor to provide the speech input to the processor;a speech repair system coupled to the processor and coupled to the context determining system, the speech repair system including a set of interpreters, each of which is configured to repair an error of a certain type in recognized text, the certain type being determined by one or more fields in one or more databases which are searched by the set of interpreters. 18. The data processing system of claim 17 wherein the context includes a history of user inputs and wherein the set of interpreters use the context in a process of determining whether to repair one or more words in the first text output and wherein the processor is capable of recognizing text using an acoustic model and a language model. 19. The data processing system of claim 18 wherein the set of interpreters search the one or more databases to compare words in the first text output with one or more items in the one or more databases when determining whether to repair one or more words in the first text output. 20. A machine readable non-transitory storage medium storing executable program instructions which when executed cause a data processing system to perform a method comprising: executing a speech assistant application which is a first application in a set of applications;receiving a digitized speech input and recognizing text in the speech input through a speech recognition system which provides a first text output;determining a command from the first text output;selecting an application in the set of applications based on the command, wherein the selected application is different than the speech assistant application, the selected application being configured to execute the command with text from or derived from the first text output;repairing text in the first text output through a set of interpreters each of which is configured to repair an error of a specific type, based on one or more fields of one or more databases, in the first text output; andmerging results from the set of interpreters to produce a final interpreted transcription to the selected application. 21. The medium as in claim 20 wherein the method further comprises: determining a context of the data processing system when the digitized speech input is received, and wherein the set of interpreters use the context when determining whether to repair one or more words in the first text output. 22. The medium as in claim 21 wherein a grammar parser determines the command from the first text output. 23. The medium as in claim 21 wherein the set of applications comprises at least two of: (a) a telephone dialer that uses the final interpreted transcription to dial a telephone number; (b) a media player for playing songs or other context; (c) a text messaging application; (d) an email application; (e) a calendar application; (f) a local search application; (g) a video conferencing application; or (h) a person or object locating application.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.