최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0835169 (2015-08-25) |
등록번호 | US-10127911 (2018-11-13) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 0 인용 특허 : 1953 |
Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based
Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
1. A method for operating a virtual assistant, the method comprising: at an electronic device: receiving, at the electronic device, an audio input comprising user speech, wherein the audio input is associated with a contextual data;determining whether the user speech contains one or more predetermin
1. A method for operating a virtual assistant, the method comprising: at an electronic device: receiving, at the electronic device, an audio input comprising user speech, wherein the audio input is associated with a contextual data;determining whether the user speech contains one or more predetermined words;in response to determining that the user speech contains one or more predetermined words: determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; andin accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data;receiving a second audio input comprising a second user speech;determining whether a second contextual data associated with the second audio input matches the contextual data;in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data: determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; andin accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech. 2. The method of claim 1, wherein the speaker profile for the predetermined user comprises a plurality of voice prints. 3. The method of claim 2, wherein each of the plurality of voice prints of the speaker profile for the predetermined user was generated from previously received audio inputs comprising user speech. 4. The method of claim 2, wherein determining whether the speaker of the user speech is the predetermined user based at least in part on the speaker profile for the predetermined user comprises: determining whether the audio input comprising user speech matches at least a threshold number of the plurality of voice prints;in accordance with a determination that the audio input comprising user speech matches at least the threshold number of the plurality of voice prints, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match at least the threshold number of the plurality of voice prints, determining that the speaker of the user speech is not the predetermined user. 5. The method of claim 2, wherein determining whether the speaker of the user speech is the predetermined user based at least in part on the speaker profile for the predetermined user comprises: determining whether the audio input comprising user speech matches at least a threshold number of the plurality of voice prints;in accordance with a determination that the audio input comprising user speech matches at least the threshold number of the plurality of voice prints: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match at least the threshold number of the plurality of voice prints:determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is the predetermined user. 6. The method of claim 1, wherein adding the audio input comprising user speech to the speaker profile for the predetermined user comprises: generating a voice print from the audio input comprising user speech; andstoring the voice print in association with the speaker profile for the predetermined user. 7. The method of claim 1, wherein the method further comprises: in accordance with a determination that the speaker of the user speech is not the predetermined user, adding the audio input comprising user speech to a speaker profile for an alternate user. 8. The method of claim 7, wherein the speaker profile for the alternate user comprises a plurality of voice prints. 9. The method of claim 8, wherein each of the plurality of voice prints of the speaker profile for the alternate user was generated from previously received audio inputs comprising user speech. 10. The method of claim 7, wherein determining whether the speaker of the user speech is the predetermined user is further based at least in part on the speaker profile for the alternate user. 11. The method of claim 7, wherein determining whether the speaker of the user speech is the predetermined user comprises: determining whether the audio input comrising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user;in accordance with a determination that the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user, determining that the speaker of the user speech is not the predetermined user. 12. The method of claim 7, wherein determining whether the speaker of the user speech is the predetermined user comprises: determining whether the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user;in accordance with a determination that the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is the predetermined user. 13. The method of claim 1, wherein the method further comprises: in accordance with a determination that the speaker of the user speech is the predetermined user: performing speech-to-text conversion on a third audio input comprising a third user speech, wherein the third audio input is received after receiving the audio input comprising user speech;determining a user intent based on the third user speech;determining a task to be performed based on the third user speech;determining a parameter for the task to be performed based on the third user speech; andperforming the task to be performed in accordance with the determined parameter. 14. A system comprising: one or more processors;memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for: receiving an audio input comprising user speech, wherein the audio input is associated with a contextual data;determining whether the user speech contains one or more predetermined words;in response to determining that the user speech contains one or more predetermined words: determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; andin accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data;receiving a second audio input comprising a second user speech;determining whether a second contextual data associated with the second audio input matches the contextual data;in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data: determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; andin accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech. 15. The system of claim 14, wherein adding the audio input comprising user speech to the speaker profile for the predetermined user comprises: generating a voice print from the audio input comprising user speech; andstoring the voice print in association with the speaker profile for the predetermined user. 16. The system of claim 14, wherein the one or more programs further includes instructions for: in accordance with a determination that the speaker of the user speech is the predetermined user: performing speech-to-text conversion on a third audio input comprising a third user speech, wherein the third audio input is received after receiving the audio input comprising user speech;determining a user intent based on the third user speech;determining a task to be performed based on the third user speech;determining a parameter for the task to be performed based on the third user speech; andperforming the task to be performed in accordance with the determined parameter. 17. The system of claim 14, wherein the speaker profile for the predetermined user comprises a plurality of voice prints. 18. The system of claim 17, wherein each of the plurality of voice prints of the speaker profile for the predetermined user was generated from previously received audio inputs comprising user speech. 19. The system of claim 17, wherein determining whether the speaker of the user speech is the predetermined user based at least in part on the speaker profile for the predetermined user comprises: determining whether the audio input comprising user speech matches at least a threshold number of the plurality of voice prints;in accordance with a determination that the audio input comprising user speech matches at least the threshold number of the plurality of voice prints, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match at least the threshold number of the plurality of voice prints, determining that the speaker of the user speech is not the predetermined user. 20. The system of claim 17, wherein determining whether the speaker of the user speech is the predetermined user based at least in part on the speaker profile for the predetermined user comprises: determining whether the audio input comprising user speech matches at least a threshold number of the plurality of voice prints;in accordance with a determination that the audio input comprising user speech matches at least the threshold number of the plurality of voice prints: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match at least the threshold number of the plurality of voice prints: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is the predetermined user. 21. The system of claim 14, wherein the one or more programs further include instructions for: in accordance with a determination that the speaker of the user speech is not the predetermined user, adding the audio input comprising user speech to a speaker profile for an alternate user. 22. The system of claim 21, wherein determining whether the speaker of the user speech is the predetermined user is further based at least in part on the speaker profile for the alternate user. 23. The system of claim 21, wherein determining whether the speacker of the user speech is the predetermined user comprises: determining whether the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user;in accordance with a determination that the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user, determining that the speaker of the user speech is not the predetermined user. 24. The system of claim 21, wherein determining whether the speaker of the user speech is the predetermined user comprises: determining whether the audio input comprising user speech matches a great number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user;in accordance with a determination that the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user: deterining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is the predetermined user. 25. The system of claim 21, wherein the speaker profile for the alternate user comprises a plurality of voice prints. 26. The system of claim 21, wherein each of the plurality of voice prints of the speaker profile for the alternate user was generated from previously received audio inputs comprising user speech. 27. A non-transitory computer-readable storage medium comprising instructions for: receiving an audio input comprising user speech, wherein the audio input is associated with a contextual data;determining whether the user speech contains one or more predetermined words;in response to determining that the user speech contains one or more predetermined words: determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; andin accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data;receiving a second audio input comprising a second user speech;determining whether a second contextual data associated with the second audio input matches the contextual data;in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data:determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; andin accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech. 28. The non-transitory computer-readable storage medium of claim 27, wherein determining whether the speaker of the user speech is the predetermined user is further based at least in part on the speaker profile for the alternate user. 29. The non-transitory computer-readable storage medium of claim 27, wherein determining whether the speaker of the user speech is the predetermined user comprises: determining whether the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user;in accordance with a determination that the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user, determining that the speaker of the user speech is not the predetermined user. 30. The non-transitory computer-readable storage medium of claim 27, wherein the speaker profile for the predetermined user comprises a plurality of voice prints. 31. The non-transitory computer-readable storage medium of claim 30, wherein adding the audio input comprising user speech to the speaker profile for the predetermined user comprises: generating a voice print from the audio input comprising user speech; andstoring the voice print in association with the speaker profile for the predetermined user. 32. The non-transitory computer-readable storage medium of claim 30, wherein instructions further comprise: in accordance with a determination that the speaker of the user speech is the predetermined user: performing speech-to-text conversion on a third audio input comprising a third user speech, wherein the third audio input is received after receiving the audio input comprising user speech;determining a user intent based on the third user speech;determining a task to be performed based on the third user speech;determining a parameter for the task to be performed based on the third user speech; andperforming the task to be performed in accordance with the determined parameter. 33. The non-transitory computer-readable storage medium of claim 30, wherein each of the plurality of voice prints of the speaker profile for the predetermined user was generated from previously received audio inputs comprising user speech. 34. The non-transitory computer-readable storage medium of claim 29, wherein determining whether the speaker of the user speech is the predetermined user based at least in part on the speaker profile for the predetermined user comprises: determining whether the audio input comprising user speech matches at least a threshold number of the plurality of voice prints;in accordance with a determination that the audio input comprising user speech matches at least the threshold number of the plurality of voice prints, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match at least the threshold number of the plurality of voice prints, determining that the speaker of the user speech is not the predetermined user. 35. The non-transitory computer-readable storage medium of claim 33, wherein determining whether the speaker of the user speech is the predetermined user based at least in part on the speaker profile for the predetermined user comprises: determining whether the audio input comprising user speech matches at least a threshold number of the plurality of voice prints;in accordance with a determination that the audio input comprising user speech matches at least the threshold number of the plurality of voice prints: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match at least the threshold number of the plurality of voice prints: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is the predetermined user. 36. The non-transitory computer-readable storage medium of claim 30, wherein the instructions further comprise: in accordance with a determination that the speaker of the user speech is not the predetermined user, adding the audio input comprising user speech to a speaker profile for an alternate user. 37. The non-transitory computer-readable storage medium of claim 36, wherein the speaker profile for the alternate user comprises a plurality of voice prints. 38. The non-transitory computer-readable storage medium of claim 37, wherein each of the plurality of voice prints of the speaker profile for the alternate user was generated from previously received audio inputs comprising user speech. 39. The non-transitory computer-readable storage medium of claim 27, wherein determining whether the speaker of the user speech is the predetermined user comprises: determining whether the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user;in accordance with a determination that the audio input comprising user speech matches a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that the audio input comprising user speech does not match a greater number of voice prints of the speaker profile for the predetermined user than a number of voice prints of the speaker profile for the alternate user: determining whether an erroneous speaker determination was made based on the contextual data;in accordance with a determination that an erroneous speaker determination was not made based on the contextual data, determining that the speaker of the user speech is not the predetermined user; andin accordance with a determination that an erroneous speaker determination was made based on the contextual data, determining that the speaker of the user speech is the predetermined user.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.