최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0389678 (2009-02-20) |
등록번호 | US-8326637 (2012-12-04) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 44 인용 특허 : 349 |
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic device
A system and method for processing multi-modal device interactions in a natural language voice services environment may be provided. In particular, one or more multi-modal device interactions may be received in a natural language voice services environment that includes one or more electronic devices. The multi-modal device interactions may include a non-voice interaction with at least one of the electronic devices or an application associated therewith, and may further include a natural language utterance relating to the non-voice interaction. Context relating to the non-voice interaction and the natural language utterance may be extracted and combined to determine an intent of the multi-modal device interaction, and a request may then be routed to one or more of the electronic devices based on the determined intent of the multi-modal device interaction.
1. A method for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, comprising: detecting at one or more electronic device, a multi-modal input that includes a first input having a first modality type and a second i
1. A method for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, comprising: detecting at one or more electronic device, a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type;extracting, at a processor, context information relating to the multi-modal input, from the first input and from the second inputdetermining, at the processor, a request from the first input or the second input;processing, at the processor, the request based on the extracted context information relating to the multi-modal input;generating at least one transaction lead based on the extracted context information of the multi-modal input;receiving at least one further input relating to the generated at least one transaction lead; andprocessing a transaction click-through in response to receiving the at least one further input. 2. The method of claim 1, wherein the first input is a non-voice input and the second input is a natural language utterance, and wherein at least one of the one or more electronic device includes an input device configured to receive the natural language utterance. 3. The method of claim 2, wherein detecting the at least one multi-modal input comprises causing, in response to the non-voice input being detected, the input device to capture the natural language utterance. 4. The method of claim 3, further comprising: synchronizing information relating to the non-voice input and the natural language utterance captured by the input device. 5. The method of claim 3, wherein the non-voice input comprises a non-voice input portion having a pre-established association with detection of multi-modal inputs in a natural language voice services environment. 6. The method of claim 5, wherein the non-voice input portion having the pre-established association with detection of multi-modal inputs in a natural language voice services environment comprises a touch gesture or a button press. 7. The method of claim 2, wherein the non-voice input comprises a selection of a segment, item, data, application, point of focus, or attention focus associated with one or more of the electronic devices. 8. The method of claim 7, wherein the context information relating to the request is based on the segment, item, data, application, point of focus, or attention focus selected by the non-voice input. 9. The method of claim 2, wherein the non-voice input comprises an identification of a point of focus or an attention focus associated with one or more of the electronic devices. 10. The method of claim 2, wherein determining the request comprises determining which one of an action, query, command, or task is being requested, and wherein extracting the context information comprises extracting, based on the non-voice input, a parameter of the action, query, command, or task. 11. The method of claim 10, wherein the parameter comprises a location or topic related to the action, query, command, or task. 12. The method of claim 10, wherein extracting the context information comprises extracting, based on the natural language utterance, a domain of the action, query, command, or task. 13. The method of claim 12, wherein the domain is a navigation, entertainment, weather, shopping, news, language, or dining domain. 14. The method of claim 2, wherein the natural language utterance is a first natural language utterance, wherein extracting the context information is further based on a second natural language utterance, wherein the second natural language utterance is detected prior to or subsequent to the first natural language utterance. 15. The method of claim 2, wherein detecting the at least one multi-modal input comprises causing, in response to a pre-established voice-based word or phrase being recognized, the input device to capture the natural language utterance. 16. The method of claim 2, wherein detecting the at least one multi-modal input comprises capturing the non-voice input after capturing the natural language utterance. 17. The method of claim 1, wherein the generated transaction lead includes at least one of an advertisement or a recommendation relating to the extracted context information relating to the multi-modal input. 18. The method of claim 1, wherein processing the request comprises routing the request to the one or more electronic devices based on the extracted context information relating to the multi-modal input. 19. A system for processing one or more multi-modal inputs in a natural language voice services environment that includes one or more electronic devices, wherein the system comprises a processing device configured to: receive from one or more electronic devices a multi-modal input that includes a first input having a first modality type and a second input having a second modality type, wherein the second input is related to the first input, and wherein the first modality type is different than the second modality type;extract context information relating to the multi-modal input, from the first input and from the second input;determine a request based on the non-voice input or the natural language utterance;process the request based on the extracted context information relating to the multi-modal input; andgenerate at least one transaction lead based on the extracted context information of the multi-modal input;receive at least one further input relating to the generated at least one transaction lead; andprocess a transaction click-through in response to receiving the at least one further input. 20. The system of claim 19, wherein the first input is a non-voice input and the second input is a natural language utterance, and wherein at least one of the one or more electronic devices includes an input device configured to receive the natural language utterance. 21. The system of claim 20, the processing devices further configured to detect the at least one multi-modal input by causing, in response to the non-voice input being detected, the input device to capture the natural language utterance. 22. The system of claim 21, the processing devices further configured to: synchronize information relating to the non-voice input and the natural language utterance captured by the input device. 23. The system of claim 21, wherein the non-voice input comprises a non-voice input portion having a pre-established association with detection of multi-modal inputs in a natural language voice services environment. 24. The system of claim 20, wherein the non-voice input comprises a selection of a segment, item, data, or application associated with one or more electronic devices. 25. The system of claim 20, wherein the non-voice input comprises an identification of a point of focus or an attention focus associated with one or more of the electronic devices. 26. The system of claim 19, wherein the generated transaction lead includes at least one of an advertisement or a recommendation relating to the extracted context information relating to the multi-modal input.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.