최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0846650 (2015-09-04) |
등록번호 | US-10255907 (2019-04-09) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 0 인용 특허 : 1953 |
Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic
Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: receive a user input;determine a first similarity between a repr
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by one or more processors of an electronic device, cause the electronic device to: receive a user input;determine a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, wherein the first acoustic model is associated with a first accent of a language;determine a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models, wherein the second acoustic model is associated with a second accent of the language;determine whether the first similarity is greater than the second similarity;in accordance with a determination that the first similarity is greater than the second similarity, select the first acoustic model;in accordance with a determination that the first similarity is not greater than the second similarity, select the second acoustic model;associate the selected acoustic model with a user profile corresponding to a speaker;receive a second user input from the speaker; anddetermine a representation of the second user input using the selected acoustic model associated with the user profile corresponding to the speaker. 2. The non-transitory computer-readable storage medium of claim 1, wherein the user input comprises a plurality of user utterances. 3. The non-transitory computer-readable storage medium of claim 1, wherein determining whether the first similarity is greater than the second similarity includes determining whether a distance between the representation of the user input and the first acoustic model is greater than a distance between the representation of the user input and the second acoustic model. 4. The non-transitory computer-readable storage medium of claim 3, wherein determining whether a distance between the representation of the user input and the first acoustic model is greater than a distance between the representation of the user input and the second acoustic model includes: determining a first angle between the representation of the user input and the first acoustic model; anddetermining a second angle between the representation of the user input and the second acoustic model,wherein the distance between the representation of the user input and the first acoustic model is based on the first angle and wherein the distance between the representation of the user input and the second acoustic model is based on the second angle. 5. The non-transitory computer-readable storage medium of claim 1, wherein the first acoustic model of the plurality of acoustic models is associated with a first level of a data structure and the second acoustic model of the plurality of acoustic models is associated with a second level of the data structure. 6. The non-transitory computer-readable storage medium of claim 5, wherein the first acoustic model is a first neural network acoustic model and the second acoustic model is a second neural network acoustic model and wherein the first and second neural network acoustic models share a same neural network layer. 7. The non-transitory computer-readable storage medium of claim 5, wherein the second acoustic model is adapted from the first acoustic model using MLLR adaption, cMLLR adaption, MAP adaption, Eigenvoice adaption, or a combination thereof. 8. A method, comprising: at an electronic device with one or more processors and memory: receiving a user input;determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, wherein the first acoustic model is associated with a first accent of a language;determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models, wherein the second acoustic model is associated with a second accent of the language;determining whether the first similarity is greater than the second similarity;in accordance with a determination that the first similarity is greater than the second similarity, selecting the first acoustic model;in accordance with a determination that the first similarity is not greater than the second similarity, selecting the second acoustic model;associating the selected acoustic model with a user profile corresponding to a speaker;receiving a second user input from the speaker; and determining a representation of the second user input using the selected acoustic model associated with the user profile corresponding to the speaker. 9. The method of claim 8, wherein the user input comprises a plurality of user utterances. 10. The method of claim 8, wherein determining whether the first similarity is greater than the second similarity includes determining whether a distance between the representation of the user input and the first acoustic model is greater than a distance between the representation of the user input and the second acoustic model. 11. The method of claim 10, wherein determining whether a distance between the representation of the user input and the first acoustic model is greater than a distance between the representation of the user input and the second acoustic model includes: determining a first angle between the representation of the user input and the first acoustic model; anddetermining a second angle between the representation of the user input and the second acoustic model,wherein the distance between the representation of the user input and the first acoustic model is based on the first angle and wherein the distance between the representation of the user input and the second acoustic model is based on the second angle. 12. The method of claim 8, wherein the first acoustic model of the plurality of acoustic models is associated with a first level of a data structure and the second acoustic model of the plurality of acoustic models is associated with a second level of the data structure. 13. The method of claim 12, wherein the first acoustic model is a first neural network acoustic model and the second acoustic model is a second neural network acoustic model and wherein the first and second neural network acoustic models share a same neural network layer. 14. The method of claim 12, wherein the second acoustic model is adapted from the first acoustic model using MLLR adaption, cMLLR adaption, MAP adaption, Eigenvoice adaption, or a combination thereof. 15. An electronic device, comprising: one or more processors;a memory; andone or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving a user input;determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, wherein the first acoustic model is associated with a first accent of a language;determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models, wherein the second acoustic model is associated with a second accent of the language;determining whether the first similarity is greater than the second similarity;in accordance with a determination that the first similarity is greater than the second similarity, selecting the first acoustic model;in accordance with a determination that the first similarity is not greater than the second similarity, selecting the second acoustic model;associating the selected acoustic model with a user profile corresponding to a speaker;receiving a second user input from the speaker; anddetermining a representation of the second user input using the selected acoustic model associated with the user profile corresponding to the speaker. 16. The electronic device of claim 15, wherein the user input comprises a plurality of user utterances. 17. The electronic device of claim 15, wherein determining whether the first similarity is greater than the second similarity includes determining whether a distance between the representation of the user input and the first acoustic model is greater than a distance between the representation of the user input and the second acoustic model. 18. The electronic device of claim 17, wherein determining whether a distance between the representation of the user input and the first acoustic model is greater than a distance between the representation of the user input and the second acoustic model includes: determining a first angle between the representation of the user input and the first acoustic model; anddetermining a second angle between the representation of the user input and the second acoustic model,wherein the distance between the representation of the user input and the first acoustic model is based on the first angle and wherein the distance between the representation of the user input and the second acoustic model is based on the second angle. 19. The electronic device of claim 15, wherein the first acoustic model of the plurality of acoustic models is associated with a first level of a data structure and the second acoustic model of the plurality of acoustic models is associated with a second level of the data structure. 20. The electronic device of claim 19, wherein the first acoustic model is a first neural network acoustic model and the second acoustic model is a second neural network acoustic model and wherein the first and second neural network acoustic models share a same neural network layer. 21. The non-transitory computer-readable storage medium of claim 1, wherein the one or more programs further comprise instructions, which when executed by one or more processors of the electronic device, cause the electronic device to: in response to selecting the first acoustic model: generate a first recognition result based on at least the user input and the first acoustic model; andoutput the first recognition result;in response to selecting the second acoustic model: generate a second recognition result based on at least the user input and the second acoustic model; andoutput the second recognition result. 22. The method of claim 8, further comprising: in response to selecting the first acoustic model: generating a first recognition result based on at least the user input and the first acoustic model; andoutputting the first recognition result;in response to selecting the second acoustic model: generating a second recognition result based on at least the user input and the second acoustic model; andoutputting the second recognition result. 23. The electronic device of claim 15, wherein the one or more programs further include instructions for: in response to selecting the first acoustic model: generating a first recognition result based on at least the user input and the first acoustic model; andoutputting the first recognition result;in response to selecting the second acoustic model: generating a second recognition result based on at least the user input and the second acoustic model; andoutputting the second recognition result. 24. The non-transitory computer-readable storage medium of claim 1, wherein the representation of the user input is a vector representation.
Copyright KISTI. All Rights Reserved.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.