최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0835520 (2015-08-25) |
등록번호 | US-9646609 (2017-05-09) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 0 인용 특허 : 1951 |
Systems and processes for generating a shared pronunciation lexicon and using the shared pronunciation lexicon to interpret spoken user inputs received by a virtual assistant are provided. In one example, the process can include receiving pronunciations for words or named entities from multiple user
Systems and processes for generating a shared pronunciation lexicon and using the shared pronunciation lexicon to interpret spoken user inputs received by a virtual assistant are provided. In one example, the process can include receiving pronunciations for words or named entities from multiple users. The pronunciations can be tagged with context tags and stored in the shared pronunciation lexicon. The shared pronunciation lexicon can then be used to interpret a spoken user input received by a user device by determining a relevant subset of the shared pronunciation lexicon based on contextual information associated with the user device and performing speech-to-text conversion on the spoken user input using the determined subset of the shared pronunciation lexicon.
1. A method for operating a virtual assistant, the method comprising: at an electronic device having a processor and memory: receiving, from a first user device, a first pronunciation for a first named entity;receiving, from a second user device, a second pronunciation for the first named entity;sto
1. A method for operating a virtual assistant, the method comprising: at an electronic device having a processor and memory: receiving, from a first user device, a first pronunciation for a first named entity;receiving, from a second user device, a second pronunciation for the first named entity;storing the first and second pronunciations for the first named entity in a shared pronunciation lexicon;receiving, from a third user device, audio data representing user speech and a context of the third user device, the user speech including the first named entity in spoken form; andperforming speech-to-text conversion on the audio data to generate a textual representation of the user speech, the speech-to-text conversion comprising selecting, for comparison with the audio data, the stored first pronunciation or the stored second pronunciation for the first named entity based on the context. 2. The method of claim 1, wherein storing the first and second pronunciations for the first named entity in the shared pronunciation lexicon comprises: determining one or more first context tags for the first pronunciation for the first named entity;storing the first pronunciation for the first named entity in association with the determined one or more first context tags;determining one or more second context tags for the second pronunciation for the first named entity; andstoring the second pronunciation for the first named entity in association with the determined one or more second context tags. 3. The method of claim 2, wherein the one or more first context tags comprise a location tag identifying a location associated with the first pronunciation for the first named entity. 4. The method of claim 2, wherein the one or more first context tags comprise a domain tag identifying a subject matter domain associated with the first pronunciation for the first named entity. 5. The method of claim 2, wherein the one or more first context tags comprise a language tag identifying a language associated with the first pronunciation for the first named entity. 6. The method of claim 1, wherein the method further comprises: receiving, from a fourth user device, a pronunciation for a second named entity; andstoring the pronunciation for the second named entity in the shared pronunciation lexicon. 7. The method of claim 1, wherein the method further comprises: determining a relevant subset of the shared pronunciation lexicon based on the context of the third user device,wherein the speech-to-text conversion on the audio data is performed using the determined relevant subset of the shared pronunciation lexicon to generate the textual representation of the user speech. 8. The method of claim 7, wherein performing speech-to-text conversion on the audio data using the determined relevant subset of the shared pronunciation lexicon excludes the use of portions of the shared pronunciation lexicon not included in the determined relevant subset of the shared pronunciation lexicon. 9. The method of claim 7, wherein determining the relevant subset of the shared pronunciation lexicon based on the context of the third user device comprises determining to include, in the relevant subset of the shared pronunciation lexicon, one or more pronunciations for named entities in the shared pronunciation lexicon that are associated with a context tag related to the context of the third user device. 10. The method of claim 7, wherein the context of the third user device comprises a contact list stored on the third user device, and wherein determining the relevant subset of the shared pronunciation lexicon based on the context of the third user device comprises determining to include, in the relevant subset of the shared pronunciation lexicon, one or more pronunciations for named entities in the contact list that are also in the shared pronunciation lexicon. 11. The method of claim 7, wherein the relevant subset of the shared pronunciation lexicon includes the first pronunciation for the first named entity, but not the second pronunciation for the first named entity. 12. The method of claim 1, further comprising: receiving, from a fifth user device, audio data representing a second user speech and a context of the fifth user device;determining a relevant subset of the shared pronunciation lexicon based on the context of the fifth user device;determining a response to the second user speech, wherein the response comprises a second named entity;determining a pronunciation for the second named entity using the determined relevant subset of the shared pronunciation lexicon; andtransmitting the response and the pronunciation for the second named entity to the fourth fifth user device. 13. The method of claim 1, further comprising: deleting one or more pronunciations for a named entity in the shared pronunciation lexicon that has least recently been accessed. 14. The method of claim 1, wherein the first pronunciation for the first named entity comprises an audio recording of a user associated with the first user device speaking the first named entity, and wherein storing the first pronunciation for the first-named entity in the shared pronunciation lexicon comprises: generating an acoustic model representing the first pronunciation for the first named entity based on the audio recording, andstoring the acoustic model in the shared pronunciation lexicon. 15. The method of claim 1, wherein the first pronunciation for the first named entity comprises an acoustic model representing the first pronunciation for the first named entity generated by the first user device. 16. The method of claim 1, wherein based on the context, the selecting comprises selecting, for comparison with the audio data, the stored first pronunciation, but not the stored second pronunciation for the first named entity. 17. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, when executed by one or more processors, cause the one or more processors to: receive, from a first user device, a first pronunciation for a first named entity;receive, from a second user device, a second pronunciation for the first named entity;store the first and second pronunciations for the first named entity in a shared pronunciation lexiconreceive, from a third user device, audio data representing user speech and a context of the third user device, the user speech including the first named entity in spoken form; andperform speech-to-text conversion on the audio data to generate a textual representation of the user speech, the speech-to-text conversion comprising selecting, for comparison with the audio data, the stored first pronunciation or the stored second pronunciation for the first named entity based on the context. 18. The computer-readable storage medium of claim 17, wherein storing the first and second pronunciations for the first named entity in the shared pronunciation lexicon comprises: determining one or more first context tags for the first pronunciation for the first named entity;storing the first pronunciation for the first named entity in association with the determined one or more first context tags;determining one or more second context tags for the second pronunciation for the first named entity; andstoring the second pronunciation for the first named entity in association with the determined one or more second context tags. 19. The computer-readable storage medium of claim 18, wherein the one or more first context tags comprise a location tag identifying a location associated with the first pronunciation for the first named entity. 20. The computer-readable storage medium of claim 18, wherein the one or more first context tags comprise a domain tag identifying a subject matter domain associated with the first pronunciation for the first named entity. 21. The computer-readable storage medium of claim 18, wherein the one or more first context tags comprise a language tag identifying a language associated with the first pronunciation for the first named entity. 22. The computer-readable storage medium of claim 17, wherein the instructions further cause the one or more processors to: receive, from a fourth user device, a pronunciation for a second named entity; andstore the pronunciation for the second named entity in the shared pronunciation lexicon. 23. The computer-readable storage medium of claim 17, wherein the instructions further cause the one or more processors to: determine a relevant subset of the shared pronunciation lexicon based on the context of the third user device,wherein the speech-to-text conversion on the audio data is performed using the determined relevant subset of the shared pronunciation lexicon to generate the textual representation of the user speech. 24. The computer-readable storage medium of claim 23, wherein performing speech-to-text conversion on the audio data using the determined relevant subset of the shared pronunciation lexicon excludes the use of portions of the shared pronunciation lexicon not included in the determined relevant subset of the shared pronunciation lexicon. 25. The computer-readable storage medium of claim 23, wherein determining the relevant subset of the shared pronunciation lexicon based on the context of the third user device comprises determining to include, in the relevant subset of the shared pronunciation lexicon, one or more pronunciations for named entities in the shared pronunciation lexicon that are associated with a context tag related to the context of the third user device. 26. The computer-readable storage medium of claim 23, wherein the context of the third user device comprises a contact list stored on the third user device, and wherein determining the relevant subset of the shared pronunciation lexicon based on the context of the third user device comprises determining to include, in the relevant subset of the shared pronunciation lexicon, one or more pronunciations for named entities in the contact list that are also in the shared pronunciation lexicon. 27. The computer-readable storage medium of claim 17, wherein the instructions further cause the one or more processors to: receive, from a fifth user device, audio data representing a second user speech and a context of the fifth user device;determine a relevant subset of the shared pronunciation lexicon based on the context of the fifth user device;determine a response to the second user speech, wherein the response comprises a second named entity;determine a pronunciation for the second named entity using the determined relevant subset of the shared pronunciation lexicon; and transmit the response and the pronunciation for the second named entity to the fifth user device. 28. The computer-readable storage medium of claim 17, wherein the instructions further cause the one or more processors to: delete one or more pronunciations for a named entity in the shared pronunciation lexicon that has least recently been accessed. 29. The computer-readable storage medium of claim 17, wherein the first pronunciation for the first named entity comprises an audio recording of a user associated with the first user device speaking the first named entity, and wherein storing the first pronunciation for the first-named entity in the shared pronunciation lexicon comprises: generating an acoustic model representing the first pronunciation for the first named entity based on the audio recording, andstoring the acoustic model in the shared pronunciation lexicon. 30. The computer-readable storage medium of claim 17, wherein the first pronunciation for the first named entity comprises an acoustic model representing the first pronunciation for the first named entity generated by the first user device. 31. A system comprising: one or more processors;memory storing one or more programs, the one or more programs comprising instructions, when executed by the one or more processors, cause the one or more processors to: receive, from a first user device; a first pronunciation for a first named entity;receive, from a second user device, a second pronunciation for the first named entity;store the first and second pronunciations for the first named entity in a shared pronunciation lexicon,receive, from a third user device, audio data representing user speech and a context of the third user device, the user speech including the first named entity in spoken form; andperform speech-to-text conversion on the audio data to generate a textual representation of the user speech, the speech-to-text conversion comprising selecting, for comparison with the audio data, the stored first pronunciation or the stored second pronunciation for the first named entity based on the context. 32. The system of claim 31, wherein storing the first and second pronunciations for the first named entity in the shared pronunciation lexicon comprises: determining one or more first context tags for the first pronunciation for the first named entity;storing the first pronunciation for the first named entity in association with the determined one or more first context tags;determining one or more second context tags for the second pronunciation for the first named entity; andstoring the second pronunciation for the first named entity in association with the determined one or more second context tags. 33. The system of claim 32, wherein the one or more first context tags comprise a location tag identifying a location associated with the first pronunciation for the first named entity. 34. The system of claim 32, wherein the one or more first context tags comprise a domain tag identifying a subject matter domain associated with the first pronunciation for the first named entity. 35. The system of claim 32, wherein the one or more first context tags comprise a language tag identifying a language associated with the first pronunciation for the first named entity. 36. The system of claim 31, wherein the instructions further cause the one or more processors to: receive, from a fourth user device, a pronunciation for a second named entity; andstore the pronunciation for the second named entity in the shared pronunciation lexicon. 37. The system of claim 31, wherein the instructions further cause the one or more processors to: determine a relevant subset of the shared pronunciation lexicon based on the context of the third user device,wherein the speech-to-text conversion on the audio data is performed using the determined relevant subset of the shared pronunciation lexicon to generate the textual representation of the user speech. 38. The system of claim 37, wherein performing speech-to-text conversion on the audio data using the determined relevant subset of the shared pronunciation lexicon excludes the use of portions of the shared pronunciation lexicon not included in the determined relevant subset of the shared pronunciation lexicon. 39. The system of claim 37, wherein determining the relevant subset of the shared pronunciation lexicon based on the context of the third user device comprises determining to include, in the relevant subset of the shared pronunciation lexicon, one or more pronunciations for named entities in the shared pronunciation lexicon that are associated with a context tag related to the context of the third user device. 40. The system of claim 37, wherein the context of the third user device comprises a contact list stored on the third user device, and wherein determining the relevant subset of the shared pronunciation lexicon based on the context of the third user device comprises determining to include, in the relevant subset of the shared pronunciation lexicon, one or more pronunciations for named entities in the contact list that are also in the shared pronunciation lexicon. 41. The system of claim 31, wherein the instructions further cause the one or more processors to: receive, from a fifth user device, audio data representing a second user speech and a context of the fifth user device;determine a relevant subset of the shared pronunciation lexicon based on the context of the fifth user device;determine a response to the second user speech, wherein the response comprises a second named entity;determine a pronunciation for the second named entity using the determined relevant subset of the shared pronunciation lexicon; andtransmit the response and the pronunciation for the second named entity to the fifth user device. 42. The system of claim 31, wherein the instructions further cause the one or more processors to: delete one or more pronunciations for a named entity in the shared pronunciation lexicon that has least recently been accessed. 43. The system of claim 31, wherein the first pronunciation for the first named entity comprises an audio recording of a user associated with the first user device speaking the first named entity, and wherein storing the first pronunciation for the first named entity in the shared pronunciation lexicon comprises: generating an acoustic model representing the first pronunciation for the first named entity based on the audio recording, andstoring the acoustic model in the shared pronunciation lexicon. 44. The system of claim 31, wherein the first pronunciation for the first named entity comprises an acoustic model representing the first pronunciation for the first named entity generated by the first user device.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.