최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0084197 (2011-04-11) |
등록번호 | US-8195468 (2012-06-05) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 118 인용 특허 : 312 |
A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/o
A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.
1. A mobile device for processing multi-modal natural language inputs, comprising: a conversational voice user interface that receives a multi-modal natural language input from a user, the multi-modal natural language input including a natural language utterance and a non-speech input, the conversat
1. A mobile device for processing multi-modal natural language inputs, comprising: a conversational voice user interface that receives a multi-modal natural language input from a user, the multi-modal natural language input including a natural language utterance and a non-speech input, the conversational voice user interface coupled to a transcription module that transcribes the non-speech input to create a non-speech-based transcription;a conversational speech analysis engine that identifies the user that provided the multi-modal natural language input, the conversational speech analysis engine using a speech recognition engine and a semantic knowledge-based model to create a speech-based transcription of the natural language utterance, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the mobile device, a general cognitive model derived from one or more prior interactions between a plurality of users and the mobile device, and an environmental model derived from an environment of the identified user and the mobile device;a merging module that merges the speech-based transcription and the non-speech-based transcription to create a merged transcription;a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the merged transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; anda response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request. 2. The mobile device of claim 1, wherein the response includes an aggregation of content gathered when the identified domain agent processes the request. 3. The mobile device of claim 1, wherein the conversational speech analysis engine supports interactions with the plurality of users during an overlapping session. 4. The mobile device of claim 1, wherein the conversational speech analysis engine supports interactions with the plurality of users during an interleaved session. 5. The mobile device of claim 4, wherein the mobile device processes queries in an order of receipt during the interleaved session. 6. The mobile device of claim 4, wherein the mobile device processes queries in an order determined by a length of the queries during the interleaved session. 7. The mobile device of claim 1, wherein the conversational speech analysis engine identifies the user based on at least one of voiceprint matching, password matching, or pass-phrase matching. 8. The mobile device of claim 1, wherein the conversational voice user interface subsequently receives one or more follow-up multi-modal inputs, the follow-up multi-modal inputs including at least one of a follow-up natural language utterance or a follow-up non-speech input. 9. The mobile device of claim 8, wherein the identified domain agent updates the context stack and the semantic knowledge-based model in response to processing the request. 10. The mobile device of claim 9, wherein the knowledge-enhanced speech recognition engine determines a most likely context for the follow-up multi-modal input using the updated context stack. 11. The mobile device of claim 9, wherein the conversational speech analysis engine creates a speech-based transcription of the follow-up natural language utterance using the updated semantic knowledge-based model. 12. The mobile device of claim 1, wherein the identified domain agent processes the request by querying one or more local or network information sources. 13. The mobile device of claim 12, wherein one or more of the local or network information sources includes an Internet browsing service. 14. The mobile device of claim 1, wherein the identified domain agent processes the request by directing a command to one or more local or remote devices. 15. A system for processing multi-modal natural language inputs, comprising: a plurality of mobile devices that support multi-modal natural language interactions with a user;a context manager communicatively coupled to the plurality of mobile devices, wherein the context manager synchronizes a semantic knowledge-based model and a context stack among the plurality of mobile devices, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the user and one or more of the mobile devices, a general cognitive model derived from one or more prior interactions between a plurality of users and one or more of the mobile devices, and an environmental model derived from an environment of the user and one or more of the mobile devices;a conversational voice user interface communicatively coupled to one or more of the plurality of mobile devices, wherein the conversational voice user interface receives a multi-modal natural language input from the user that includes at least a natural language utterance;a conversational speech analysis engine that identifies the user that provided the multi-modal input, the conversational speech analysis engine using a speech recognition engine and the semantic knowledge-based model to create a speech-based transcription of the natural language utterance;a knowledge-enhanced speech recognition engine that identifies one or more entries in a context stack matching information contained in the speech-based transcription and determines a most likely context for the multi-modal natural language input based on the identified entries; anda response generating module that identifies a domain agent associated with the most likely context for the multi-modal input, communicates a request to the identified domain agent, and generates a response to the user from content provided by the identified domain agent as a result of processing the request. 16. The system of claim 15, wherein the plurality of mobile devices register with the context manager and subscribe to one or more events that the context manager broadcasts. 17. The system of claim 16, wherein the context module receives an input from one or more of the mobile devices that updates at least one of the semantic knowledge-based model or the context stack. 18. The system of claim 17, wherein the context module broadcasts an event relating to the received input to the subscribed mobile devices. 19. A method for processing multi-modal natural language inputs, comprising: receiving a multi-modal natural language input at a conversational voice user interface, the multi-modal input including a natural language utterance and a non-speech input provided by a user, wherein a transcription module coupled to the conversational voice user interface transcribes the non-speech input to create a non-speech-based transcription;identifying the user that provided the multi-modal input;creating a speech-based transcription of the natural language utterance using a speech recognition engine and a semantic knowledge-based model, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the identified user and the conversational voice user interface, a general cognitive model derived from one or more prior interactions between a plurality of users and the conversational voice user interface, and an environmental model derived from an environment of the identified user and the conversational voice user interface;merging the speech-based transcription and the non-speech-based transcription to create a merged transcription;identifying one or more entries in a context stack matching information contained in the merged transcription;determining a most likely context for the multi-modal input based on the identified entries;identifying a domain agent associated with the most likely context for the multi-modal input;communicating a request to the identified domain agent; andgenerating a response to the user from content provided by the identified domain agent as a result of processing the request. 20. The method of claim 19, wherein the generated response includes an aggregation of content gathered when the identified domain agent processes the request. 21. The method of claim 19, wherein the conversational voice user interface supports interactions with the plurality of users during an overlapping session. 22. The method of claim 19, wherein the conversational voice user interface supports interactions with the plurality of users during an interleaved session. 23. The method of claim 22, wherein queries are processed in an order of receipt during the interleaved session. 24. The method of claim 23, wherein queries are processed in an order determined by a length of the queries during the interleaved session. 25. The method of claim 19, further comprising verifying an identity of the user based on voiceprint matching, password matching, or pass-phrase matching. 26. The method of claim 19, further comprising receiving one or more follow-up multi-modal inputs at the conversational voice user interface, the follow-up multi-modal inputs including at least one of a follow-up natural language utterance or a follow-up non-speech input. 27. The method of claim 26, wherein the identified domain agent updates the context stack and the semantic knowledge-based model in response to processing the request. 28. The method of claim 27, wherein the knowledge-enhanced speech recognition engine determines a most likely context for the follow-up multi-modal input using the updated context stack. 29. The method of claim 27, wherein the speech recognition engine creates a speech-based transcription of the follow-up natural language utterance using the updated semantic knowledge-based model. 30. The method of claim 19, wherein the identified domain agent processes the request by querying one or more local or network information sources. 31. The method of claim 30, wherein one or more of the local or network information sources includes an Internet browsing service. 32. The method of claim 19, wherein the identified domain agent processes the request by directing a command to one or more local or remote devices. 33. A device for supporting natural language human-machine interactions, wherein the device comprises one or more processors configured to: receive a multi-modal natural language input that includes a natural language utterance and a non-speech component;use a speech recognition engine and a semantic knowledge-based model to create a speech-based transcription associated with the natural language utterance, wherein the semantic knowledge-based model includes a personalized cognitive model derived from one or more prior interactions between the device and a user that spoke the natural language utterance, a general cognitive model derived from one or more prior interactions between the device and multiple different users, and an environmental model derived from an environment associated with the device and the user that spoke the natural language utterance;merge the speech-based transcription associated with the natural language utterance with a non-speech-based transcription associated with the non-speech component to create a merged transcription associated with the multi-modal natural language input;identify one or more entries in a context stack matching information contained in the merged transcription to determine a most likely context associated with the multi-modal natural language input based on the one or more identified entries;communicate a request to a domain agent associated with the most likely context; andgenerate a response to the multi-modal natural language input, wherein the response to the multi-modal natural language input includes content that the domain agent associated with the most likely context produced to process the request.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.