최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0846667 (2015-09-04) |
등록번호 | US-10186254 (2019-01-22) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 0 인용 특허 : 1957 |
The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated w
The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device, cause the electronic device to: receive user input comprising natural language speech including one or more words;identify
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device, cause the electronic device to: receive user input comprising natural language speech including one or more words;identify at least one context associated with the user input;divide the user input into a plurality of audio frames;determine whether each frame of the plurality of frames includes audio information associated with the user input;in accordance with a determination that a frame includes audio information associated with the user input, determine whether a threshold number of frames of silence follow; in accordance with a determination that the threshold number of frames of silence follow, cease recording the user input;in accordance with a determination that the threshold number of frames of silence do not follow, continue recording the user input;generate a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint;determine whether the probability is greater than a threshold; andin accordance with a determination that the probability is greater than the threshold, identify the location in the user input as the endpoint. 2. The non-transitory computer-readable storage medium of claim 1, wherein the non-transitory computer-readable storage medium further comprises instructions, which when executed by the one or more processors of the electronic device, cause the device to: record the user input until the endpoint is identified. 3. The non-transitory computer-readable storage medium of claim 1, wherein the non-transitory computer-readable storage medium further comprises instructions, which when executed by the one or more processors of the electronic device, cause the device to: generate a probability that each frame of the plurality of frames includes the endpoint. 4. The non-transitory computer-readable storage medium of claim 3, wherein the non-transitory computer-readable storage medium further comprises instructions, which when executed by the one or more processors of the electronic device, cause the device to: record the user input until the endpoint is identified;wherein in accordance with a determination that the probability of an endpoint for a first frame is greater than the threshold: stop recording the user input; andprocess the user input; andwherein in accordance with a determination that the probability of an endpoint for a second frame is not greater than the threshold: continue recording the user input. 5. The non-transitory computer-readable storage medium of claim 3, wherein the non-transitory computer-readable storage medium comprises instructions, which when executed by the one or more processors of the electronic device, cause the device to: further in accordance with a determination that the probability of an endpoint for a first frame is greater than the threshold, output data associated with end of speech; andfurther in accordance with a determination that the probability of an endpoint for a second frame is not greater than the threshold, output data associated with continuing speech. 6. The non-transitory computer-readable storage medium of claim 5, wherein the data associated with end of speech is an end of speech tag. 7. The non-transitory computer-readable storage medium of claim 5, wherein the data associated with continuing speech is a continuing speech tag. 8. The non-transitory computer-readable storage medium of claim 1, wherein the threshold number of frames of silence is associated with a duration of the determination whether the probability is greater than the threshold. 9. The non-transitory computer-readable storage medium of claim 1, wherein the at least one context includes the content of the one or more words. 10. The non-transitory computer-readable storage medium of claim 1, wherein the at least one context includes the device context. 11. The non-transitory computer-readable storage medium of claim 10, wherein the device context includes the location of the device. 12. The non-transitory computer-readable storage medium of claim 1, wherein the at least one context includes a rate of speech. 13. The non-transitory computer-readable storage medium of claim 1, wherein the at least one context includes information associated with a user. 14. The non-transitory computer-readable storage medium of claim 13, wherein the information associated with a user includes the identity of the user. 15. The non-transitory computer-readable storage medium of claim 13, wherein the information associated with a user includes information about the speech patterns of the user. 16. The non-transitory computer-readable storage medium of claim 1, further comprising a digital assistant operable through the device, wherein the at least one context includes the state of the digital assistant. 17. The non-transitory computer-readable storage medium of claim 16, wherein the at least one context includes user input relative to the state of the digital assistant. 18. The non-transitory computer-readable storage medium of claim 1, further comprising an application operable through the device, wherein the at least one context includes application context. 19. The non-transitory computer readable storage medium of claim 1, further comprising instructions that cause the electronic device to output an end-of-speech tag. 20. The non-transitory computer readable storage medium of claim 1, wherein each frame of the threshold number of frames of silence includes background noise, and wherein each frame of the threshold number of frames of silence does not include audio information associated with the user input. 21. The non-transitory computer readable storage medium of claim 1, wherein determining whether the threshold number of frames of silence follow is performed further in accordance with a determination that the probability is less than the threshold. 22. The non-transitory computer readable storage medium of claim 1, wherein the non-transitory computer-readable storage medium further comprises instructions, which when executed by the one or more processors of the electronic device, cause the device to: provide the user input and the location in the user input to a digital assistant; andperform, using the digital assistant, a task based on the user input. 23. A method for identifying an endpoint of a spoken request by a user, comprising: at a device with one or more processors and memory: receiving user input comprising natural language speech including one or more words;identifying at least one context associated with the user input;dividing the user input into a plurality of audio frames;determining whether each frame of the plurality of frames includes audio information associated with the user input;in accordance with a determination that a frame includes audio information associated with the user input, determining whether a threshold number of frames of silence follow; in accordance with a determination that the threshold number of frames of silence follow, ceasing to record the user input;in accordance with a determination that the threshold number of frames of silence do not follow, continuing to record the user input;generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint;determining whether the probability is greater than a threshold; andin accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint. 24. The method of claim 23, further comprising recording the user input until the endpoint is identified. 25. The method of claim 23, further comprising generating a probability that each frame of the plurality of frames includes the endpoint. 26. The method of claim 25, further comprising: recording the user input until the endpoint is identified;wherein in accordance with a determination that the probability of an endpoint for a first frame is greater than the threshold: stopping recording the user input; andprocessing the user input; andwherein in accordance with a determination that the probability of an endpoint for a second frame is not greater than the threshold: continuing to record the user input. 27. The method of claim 25, further comprising: further in accordance with a determination that the probability of an endpoint for a first frame is greater than the threshold, outputting data associated with end of speech; andfurther in accordance with a determination that the probability of an endpoint for a second frame is not greater than the threshold, outputting data associated with continuing speech. 28. The method of claim 27, wherein the data associated with end of speech is an end of speech tag. 29. The method of claim 27, wherein the data associated with continuing speech is a continuing speech tag. 30. The method of claim 23, wherein the threshold number of frames of silence is associated with a duration of the determination whether the probability is greater than the threshold. 31. The method of claim 23, wherein the at least one context includes the content of the one or more words. 32. The method of claim 23, wherein the at least one context includes the device context. 33. The method of claim 32, wherein the device context includes the location of the device. 34. The method of claim 23, wherein the at least one context includes a rate of speech. 35. The method of claim 23, wherein the at least one context includes information associated with a user. 36. The method of claim 35, wherein the information associated with a user includes the identity of the user. 37. The method of claim 35, wherein the information associated with a user includes information about the speech patterns of the user. 38. The method of claim 23, further comprising operating a digital assistant through the device, wherein the at least one context includes the state of the digital assistant. 39. The method of claim 38, wherein the at least one context includes user input relative to the state of the digital assistant. 40. The method of claim 23, further comprising operating an application through the device, wherein the at least one context includes application context. 41. The method of claim 23, further comprising outputting an end-of-speech tag. 42. The method of claim 23, wherein each frame of the threshold number of frames of silence includes background noise, and wherein each frame of the threshold number of frames of silence does not include audio information associated with the user input. 43. The method of claim 23, wherein determining whether the threshold number of frames of silence follow is performed further in accordance with a determination that the probability is less than the threshold. 44. The method of claim 23, further comprising: providing the user input and the location in the user input to a digital assistant; andperforming, using the digital assistant, a task based on the user input. 45. An electronic device, comprising: a display;a memory;a processor coupled to the display and the memory;and programs stored in the memory to be executed by the processor, the programs comprising instructions for: receiving user input comprising natural language speech including one or more words;identifying at least one context associated with the user input;dividing the user input into a plurality of audio frames;determining whether each frame of the plurality of frames includes audio information associated with the user input;in accordance with a determination that a frame includes audio information associated with the user input, determining whether a threshold number of frames of silence follow;in accordance with a determination that the threshold number of frames of silence follow, ceasing to record the user input;in accordance with a determination that the threshold number of frames of silence do not follow, continuing to record the user input;generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint;determining whether the probability is greater than a threshold; andin accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint. 46. The electronic device of claim 45, the programs further comprising instructions for recording the user input until the endpoint is identified. 47. The electronic device of claim 45, the programs further comprising instructions for generating a probability that each frame of the plurality of frames includes the endpoint. 48. The electronic device of claim 47, the programs further comprising instructions for: recording the user input until the endpoint is identified;wherein in accordance with a determination that the probability of an endpoint for a first frame is greater than the threshold: stopping recording the user input; andprocessing the user input; andwherein in accordance with a determination that the probability of an endpoint for a second frame is not greater than the threshold: continuing to record the user input. 49. The electronic device of claim 47, the programs further comprising instructions for: further in accordance with a determination that the probability of an endpoint for a first frame is greater than the threshold, outputting data associated with end of speech; andfurther in accordance with a determination that the probability of an endpoint for a second frame is not greater than the threshold, outputting data associated with continuing speech. 50. The electronic device of claim 49, wherein the data associated with end of speech is an end of speech tag. 51. The electronic device of claim 49, wherein the data associated with continuing speech is a continuing speech tag. 52. The electronic device of claim 45, wherein the threshold number of frames of silence is associated with a duration of the determination whether the probability is greater than the threshold. 53. The electronic device of claim 45, wherein the at least one context includes the content of the one or more words. 54. The electronic device of claim 45, wherein the at least one context includes the device context. 55. The electronic device of claim 54, wherein the device context includes the location of the device. 56. The electronic device of claim 45, wherein the at least one context includes a rate of speech. 57. The electronic device of claim 45, wherein the at least one context includes information associated with a user. 58. The electronic device of claim 57, wherein the information associated with a user includes the identity of the user. 59. The electronic device of claim 57, wherein the information associated with a user includes information about the speech patterns of the user. 60. The electronic device of claim 45, the programs further comprising instructions for operating a digital assistant through the device, wherein the at least one context includes the state of the digital assistant. 61. The electronic device of claim 60, wherein the at least one context includes user input relative to the state of the digital assistant. 62. The electronic device of claim 45, the programs further comprising instructions for operating an application through the device, wherein the at least one context includes application context. 63. The electronic device of claim 45, further comprising outputting an end-of-speech tag. 64. The electronic device of claim 45, wherein each frame of the threshold number of frames of silence includes background noise, and wherein each frame of the threshold number of frames of silence does not include audio information associated with the user input. 65. The electronic device of claim 45, wherein determining whether the threshold number of frames of silence follow is performed further in accordance with a determination that the probability is less than the threshold. 66. The electronic device of claim 45, the programs further comprising instructions for: providing the user input and the location in the user input to a digital assistant; andperforming, using the digital assistant, a task based on the user input.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.