최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0822179 (2015-08-10) |
등록번호 | US-9570070 (2017-02-14) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 1 인용 특허 : 497 |
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic device
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
1. A method for processing one or more multi-modal device interactions, received from a user, in a natural language voice services environment that includes a plurality of components that handle requests relating to the multi-modal device interact, the method being implemented on a computer system h
1. A method for processing one or more multi-modal device interactions, received from a user, in a natural language voice services environment that includes a plurality of components that handle requests relating to the multi-modal device interact, the method being implemented on a computer system having one or more physical processors programmed with computer program instructions that, when executed by the one or more physical processors, program the computer system to perform the method, the method comprising: detecting, by the computer system, at least one multi-modal device interaction, wherein the multi-modal device interaction includes a non-voice interaction, from the user, with at least one of the plurality of components or an application associated with at least one of the plurality of components, and wherein the multi-modal device interaction further includes at least one natural language utterance, from the user, relating to the non-voice interaction;determining, by the computer system, a context relating to the non-voice interaction and a context relating to the natural language utterance;determining, by the computer system, an intent of the multi-modal device interaction based on the context relating to the non-voice interaction and the context of the natural language utterance;generating, by the computer system, a request based on the determined intent;obtaining, by the computer system, information indicating a capability of a component, from among the plurality of components, based on a constellation model that specifies the capabilities of each of the plurality of components;determining, by the computer system, that the component should handle the request based on the capability of the component; androuting, by the computer system, the request to the component. 2. The method of claim 1, wherein detecting the multi-modal interaction comprises: receiving, by the computer system, the non-voice interaction from a first input device; andreceiving, by the computer system, the natural language utterance from a second input device separate from the first input device. 3. The method of claim 1, wherein detecting the multi-modal interaction comprises: receiving, by the computer system, both the non-voice interaction and the natural language utterance from a first input device. 4. The method of claim 1, wherein detecting the multi-modal interaction comprises: receiving, by the computer system, the natural language utterance from a first input device;receiving, by the computer system, the natural language utterance from a second input device separate from the first input device, wherein either or both of the natural language utterance from the first input device and the natural language utterance from the second input device is used to determine the context of the natural language utterance. 5. The method of claim 1, wherein the request comprises a query, a command, or both a query and a command. 6. The method of claim 1, wherein the component comprises an electronic device identified from among a plurality of electronic devices that are separate from each other. 7. The method of claim 1, wherein the component comprises an application identified from among a plurality of applications. 8. The method of claim 7, wherein the plurality of applications are executed within a single electronic device. 9. The method of claim 7, wherein at least a first one of the plurality of applications is executed within a first electronic device and at least a second one of the plurality of applications is executed within a second electronic device. 10. The method of claim 1, wherein detecting at least one multi-modal device interaction comprises: receiving, by the computer system, a first text selection associated with a browser application, wherein the non-voice interaction comprises the first text selection, and wherein determining an intent of the multi-modal device interaction comprises using one or more words from the text selection together with the context relating to the natural language utterance. 11. The method of claim 1, wherein detecting at least one multi-modal device interaction comprises: receiving, by the computer system, a first location selection associated with a mapping application, wherein the non-voice interaction comprises the first location selection, and wherein determining an intent of the multi-modal device interaction comprises using a location identified from the first location selection together with the context relating to the natural language utterance. 12. The method of claim 1, wherein detecting at least one multi-modal device interaction comprises: receiving, by the computer system, an input associated with information identifying a first type of input device used to provide the input, wherein the non-voice interaction comprises the input, and wherein determining an intent of the multi-modal device interaction comprises using the information identifying the first type of input device together with the context relating to the natural language utterance. 13. The method of claim 1, further comprising: generating, by the computer system, at least one transaction lead based on the request. 14. The method of claim 13, wherein the at least one transaction lead comprises an advertisement or a recommendation identified based on user preferences. 15. A system for processing one or more multi-modal device interactions, received from a user, in a natural language voice services environment that includes a plurality of components that handle requests relating to the multi-modal device interact, comprising: a computer system having one or more physical processors programmed with computer instructions that, when executed by the one or more physical processors, program the computer system to:detect at least one multi-modal device interaction, wherein the multi-modal device interaction includes a non-voice interaction, from the user, with at least one of the plurality of components or an application associated with at least one of the plurality of components, and wherein the multi-modal device interaction further includes at least one natural language utterance, from the user, relating to the non-voice interaction;determine a context relating to the non-voice interaction and a context relating to the natural language utterance;determine an intent of the multi-modal device interaction based on the context relating to the non-voice interaction and the context of the natural language utterance;generate a request based on the determined intent;obtain information indicating a capability of a component, from among the plurality of components, based on a constellation model that specifies the capabilities of each of the plurality of components;determine that the component should handle the request based on a capability of the component; androute the request to the component. 16. The system of claim 15, wherein to detect the at least one multi-modal interaction, the computer system is further programmed to: receive the non-voice interaction from a first input device; andreceive the natural language utterance from a second input device separate from the first input device. 17. The system of claim 15, wherein to detect the at least one multi-modal interaction, the computer system is further programmed to: receive both the non-voice interaction and the natural language utterance from a first input device. 18. The system of claim 15, wherein to detect the at least one multi-modal interaction, the computer system is further programmed to: receive the natural language utterance from a first input device;receive the natural language utterance from a second input device separate from the first input device, wherein either or both of the natural language utterance from the first input device and the natural language utterance from the second input device is used to determine the context of the natural language utterance. 19. The system of claim 15, wherein the request comprises a query, a command, or both a query and a command. 20. The system of claim 15, wherein the component comprises an electronic device identified from among a plurality of electronic devices that are separate from each other. 21. The system of claim 15, wherein the component comprises an application identified from among a plurality of applications. 22. The system of claim 21, wherein the plurality of applications are executed within a single electronic device. 23. The system of claim 21, wherein at least a first one of the plurality of applications is executed within a first electronic device and at least a second one of the plurality of applications is executed within a second electronic device. 24. The system of claim 15, wherein to detect the at least one multi-modal interaction, the computer system is further programmed to: receive a first text selection associated with a browser application, wherein the non-voice interaction comprises the first text selection, and wherein to determine an intent of the multi-modal device interaction, the computer system is further programmed to use one or more words from the text selection together with the context relating to the natural language utterance. 25. The system of claim 15, wherein to detect the at least one multi-modal interaction, the computer system is further programmed to: receive a first location selection associated with a mapping application, wherein the non-voice interaction comprises the first location selection, and wherein to determine an intent of the multi-modal device interaction, the computer system is further programmed to use a location identified from the first location selection together with the context relating to the natural language utterance. 26. The system of claim 15, wherein to detect the at least one multi-modal interaction, the computer system is further programmed to: receive an input associated with information identifying a first type of input device used to provide the input, wherein the non-voice interaction comprises the input, and wherein to determine an intent of the multi-modal device interaction, the computer system is further programmed to use the information identifying the first type of input device together with the context relating to the natural language utterance. 27. The system of claim 15, wherein computer system is further programmed to: generate at least one transaction lead for presentation based on the request. 28. The system of claim 27, wherein the at least one transaction lead comprises an advertisement or recommendation identified based on user preferences associated with the multi-modal device interaction. 29. The system of claim 15, wherein the computer system is further programmed to: receive a set of potential responses to the request from the identified component, the set of responses including a first response and a second response;obtain information known about the user; andweight the first response and the second response based on the information. 30. The system of claim 15, wherein the computer system is further programmed to: receive an indication of a new component being added to the system; andinitiate a device listener that detects a non-voice interaction, a voice interaction, or both a non-voice interaction and a voice interaction, for the new component. 31. The system of claim 30, wherein the computer system is further programmed to: identify at least one capability of the new component; andadd, to the constellation model, information specifying the at least one capability and information identifying the new component. 32. The system of claim 15, wherein to determine the context relating to the natural language utterance, the computer system is further programmed to: provide the natural language utterance as an input to an automated speech recognizer (ASR); andreceive as an output from the ASR one or more words of the natural language, wherein the context relating to the natural language utterance is determined based on the one or more words. 33. The system of claim 15, wherein to generate the request, the computer system is further programmed to: determine a first portion of the request based on the context relating to the non-voice interaction; anddetermine a second portion of the request based on the context of the natural language utterance.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.