최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0127343 (2008-05-27) |
등록번호 | US-8589161 (2013-11-19) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 31 인용 특허 : 411 |
A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inpu
A system and method for an integrated, multi-modal, multi-device natural language voice services environment may be provided. In particular, the environment may include a plurality of voice-enabled devices each having intent determination capabilities for processing multi-modal natural language inputs in addition to knowledge of the intent determination capabilities of other devices in the environment. Further, the environment may be arranged in a centralized manner, a distributed peer-to-peer manner, or various combinations thereof. As such, the various devices may cooperate to determine intent of multi-modal natural language inputs, and commands, queries, or other requests may be routed to one or more of the devices best suited to take action in response thereto.
1. A method to provide an integrated, multi-modal, natural language voice services environment having an input device, a central device, and one or more secondary devices, wherein the method comprises: receiving, at the central device, a multi-modal natural language input from the input device, wher
1. A method to provide an integrated, multi-modal, natural language voice services environment having an input device, a central device, and one or more secondary devices, wherein the method comprises: receiving, at the central device, a multi-modal natural language input from the input device, wherein the input device initially received the multi-modal natural language input;maintaining, on the input device, the central device, and the one or more secondary devices, a constellation model that describes natural language resources, dynamic states, and intent determination capabilities associated with the input device, the central device, and the one or more secondary devices;aggregating the natural language resources, the dynamic states, and the intent determination capabilities associated with the input device and the one or more secondary devices on the central device to converge the natural language resources, the dynamic states, and the intent determination capabilities held across the natural language voice services environment on the central device;determining, on the central device, a preliminary intent associated with the multi-modal natural language input using the converged natural language resources, dynamic states, and intent determination capabilities held across the natural language voice services environment;sending the multi-modal natural language input from the central device to the one or more secondary devices to invoke the intent determination capabilities associated with the one or more secondary devices;collating, at the central device, intent determination responses received from the one or more secondary devices with the preliminary intent determined on the central device to generate an intent hypothesis associated with the multi-modal natural language input on the central device; andreturning the intent hypothesis associated with the multi-modal natural language input and information relating to one or more requests associated with the multi-modal natural language input to the input device, wherein the input device invokes one or more actions based on the returned intent hypothesis and the information relating to one or more requests associated with the multi-modal natural language input. 2. The method of claim 1, wherein the intent determination capabilities associated with the input device, the central device, and the one or more secondary devices include local processing power, local storage resources, and local natural language processing capabilities. 3. The method of claim 1, wherein collating the intent determination responses includes: receiving the intent determination responses from the one or more secondary devices in an interleaved manner; andarbitrating among the interleaved intent determination responses received from the one or more secondary devices and the preliminary intent determined on the central device to generate the intent hypothesis associated with the multi-modal natural language input. 4. The method of claim 3, wherein the generated intent hypothesis comprises one of the interleaved intent determination responses received from the one or more secondary devices or the preliminary intent determined on the central device having a highest confidence level. 5. The method of claim 3, wherein arbitrating among the interleaved intent determination responses and the preliminary intent includes: evaluating, at the central device, the constellation model to determine whether the intent determination capabilities associated with any of the one or more secondary devices include multi-pass speech recognition; andassigning a higher weight to confidence levels associated with any of the interleaved intent determination responses that were generated using multi-pass speech recognition. 6. The method of claim 3, wherein collating the intent determination responses further includes terminating the collating in response to determining that a predetermined amount of time has lapsed, a predetermined amount of resources have been consumed, or one or more of the interleaved intent determination responses received from the one or more secondary devices meets or exceeds an acceptable confidence level. 7. The method of claim 6, wherein the input device that initially received the multi-modal natural language input communicates the multi-modal natural language input to the central device in response to an initial intent determination generated on the input device failing to meet or exceed the acceptable confidence level. 8. The method of claim 1, wherein the natural language resources and the dynamic states associated with the input device, the central device, and the one or more secondary devices include local vocabularies, local vocabulary translation mechanisms, local misrecognitions, local context information, local short-term shared knowledge, local long-term shared knowledge. 9. The method of claim 1, further comprising operating the natural language voice services environment in a continuous listening mode that causes the input device to initially accept the multi-modal natural language input in response to determining that one or more predetermined events have occurred. 10. The method of claim 1, further comprising identifying, at the central device, one or more domains relevant to the multi-modal natural language input, wherein the central device sends the multi-modal language input to the one or more secondary devices in response to determining that the intent determination capabilities associated therewith have relevance to the one or more identified domains. 11. The method of claim 1, wherein the information returned to the input device includes results associated with the central device resolving the one or more requests and the one or more actions that the input device invokes include presenting the results in response to the multi-modal natural language input. 12. The method of claim 1, wherein the information returned to the input device includes one or more queries or commands formulated on the central device and the one or more actions that the input device invokes include routing the queries or commands to generate results to present in response to the multi-modal natural language input. 13. A system to provide an integrated, multi-modal, natural language voice services environment having an input device, one or more secondary devices, and a central device configured to: receive a multi-modal natural language input from the input device, wherein the input device initially received the multi-modal natural language input;maintain a constellation model and distribute the constellation model to the input device and the one or more secondary devices, wherein the constellation model describes natural language resources, dynamic states, and intent determination capabilities associated with the input device, the central device, and the one or more secondary devices;aggregate the natural language resources, the dynamic states, and the intent determination capabilities associated with the input device and the one or more secondary devices to converge the natural language resources, the dynamic states, and the intent determination capabilities held across the natural language voice services environment;use the converged natural language resources, dynamic states, and intent determination capabilities held across the natural language voice services environment to determine a preliminary intent associated with the multi-modal natural language input;send the multi-modal natural language input to the one or more secondary devices to invoke the intent determination capabilities associated with the one or more secondary devices;collate intent determination responses received from the one or more secondary devices with the determined preliminary intent to generate an intent hypothesis associated with the multi-modal natural language input on the central device; andreturn the intent hypothesis associated with the multi-modal natural language input and information relating to one or more requests associated with the multi-modal natural language input to the input device, wherein the input device is configured to invoke one or more actions based on the returned intent hypothesis and the information relating to one or more requests associated with the multi-modal natural language input. 14. The system of claim 13, wherein the intent determination capabilities associated with the input device, the central device, and the one or more secondary devices include local processing power, local storage resources, and local natural language processing capabilities. 15. The system of claim 13, wherein to collate the intent determination responses, the central device is further configured to: receive the intent determination responses from the one or more secondary devices in an interleaved manner; andarbitrate among the interleaved intent determination responses received from the one or more secondary devices and the determined preliminary intent to generate the intent hypothesis associated with the multi-modal natural language input. 16. The system of claim 15, wherein the generated intent hypothesis comprises one of interleaved intent determination responses received from the one or more secondary devices or the preliminary intent determined on the central device having a highest confidence level. 17. The system of claim 15, wherein to arbitrate among the interleaved intent determination responses and the preliminary intent, the central device is further configured to: evaluate the constellation model to determine whether the intent determination capabilities associated with any of the one or more secondary devices include multi-pass speech recognition; andassign a higher weight to confidence levels associated with any of the interleaved intent determination responses that were generated using multi-pass speech recognition. 18. The system of claim 15, wherein to collate the intent determination responses, the central device is further configured to terminate receiving the interleaved intent determination responses in response to a predetermined amount of time having lapsed, a predetermined amount of resources having been consumed, or one or more of the received interleaved intent determination responses meeting or exceeding an acceptable confidence level. 19. The system of claim 18, wherein the input device that initially received the multi-modal natural language input is configured to communicate the multi-modal natural language input to the central device in response to an initial intent determination generated on the input device failing to meet or exceed the acceptable confidence level. 20. The system of claim 13, wherein the natural language resources and the dynamic states associated with the input device, the central device, and the one or more secondary devices include local vocabularies, local vocabulary translation mechanisms, local misrecognitions, local context information, local short-term shared knowledge, local long-term shared knowledge. 21. The system of claim 13, wherein the central device is further configured to operate the natural language voice services environment in a continuous listening mode that causes the input device to initially accept the multi-modal natural language input in response to determining that one or more predetermined events have occurred. 22. The system of claim 13, wherein the central device is further configured to identify one or more domains relevant to the multi-modal natural language input and send the multi-modal language input to the one or more secondary devices in response to the intent determination capabilities associated therewith having relevance to the one or more identified domains. 23. The system of claim 13, wherein the information returned to the input device includes results associated with the central device resolving the one or more requests and the one or more actions invoked on the input device include presenting the results in response to the multi-modal natural language input. 24. The system of claim 13, wherein the information returned to the input device includes one or more queries or commands that the central device and the one or more actions invoked on the input device include routing the queries or commands to generate results to present in response to the multi-modal natural language input.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.