최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
DataON 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Edison 바로가기다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
Kafe 바로가기국가/구분 | United States(US) Patent 등록 |
---|---|
국제특허분류(IPC7판) |
|
출원번호 | US-0977494 (2015-12-21) |
등록번호 | US-9646614 (2017-05-09) |
발명자 / 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 | 피인용 횟수 : 21 인용 특허 : 2016 |
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addit
A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.
1. A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a plurality of different spoken utterances from a user;for each of the plurality of different spoken utterances: generating a respective phoneme-independent representation from the spo
1. A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a plurality of different spoken utterances from a user;for each of the plurality of different spoken utterances: generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; anddecomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user;calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; andproviding the content-independent recognition distribution value for use in a speaker identification process. 2. The method of claim 1, wherein decomposing the respective phoneme-independent representation for each of the plurality of different spoken utterances further comprises: applying a singular value decomposition to the respective phoneme-independent representation. 3. The method of claim 1 further comprising: generating the respective content-independent recognition unit from a singular value matrix of a singular value decomposition of the respective phoneme-independent representation for each of the plurality of different spoken utterances. 4. The method of claim 1, wherein the speaker identification process comprises: decomposing at least one spectral signature of an input speech signal into at least one content-independent characteristic unit;comparing the at least one content-independent characteristic unit to at least one of the content-independent recognition distribution values; anddetermining that the input speech signal is associated with the user if the at least one content-independent characteristic unit is within a threshold limit of the at least one of the content-independent recognition distribution values. 5. The method of claim 4, wherein decomposing the at least one spectral signature of the input speech signal into the at least one content-independent characteristic unit further comprises: applying a singular value decomposition to the at least one spectral signature of the input speech signal. 6. The method of claim 4, wherein: for each of the plurality of different spoken utterances, the respective phoneme-independent representation is decomposed to further obtain a respective content reference sequence;the at least one spectral signature of the input speech signal is further decomposed into at least one content input sequence; anddetermining that the input speech signal is associated with the user further comprises determining that the input speech signal is associated with the user if the at least one content input sequence is similar to at least one of the respective content reference sequences. 7. The method of claim 6, further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one of the respective content reference sequences. 8. A method for speaker identification, comprising: at a device having one or more processors and memory: receiving a spoken utterance;generating a first phoneme-independent representation based on the spoken utterance;decomposing the first phoneme-independent representation into at least one content-independent characteristic unit;comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value associated with a registered user of the device, the at least one content-independent recognition distribution value previously generated by: generating a second phoneme-independent representation based on speech from the registered user; anddecomposing the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; anddetermining that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. 9. The method of claim 8, further comprising: generating the at least one content-independent characteristic unit from a singular value matrix of a singular value decomposition of the first phoneme-independent representation. 10. The method of claim 8, further comprising: computing the at least one content-independent recognition distribution value from the at least one content-independent recognition unit. 11. The method of claim 10, further comprising: generating the at least one content-independent recognition unit from a singular value matrix of a singular value decomposition of the second phoneme-independent representation. 12. The method of claim 8, wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 13. The method of claim 8, wherein the first phoneme-independent representation is further decomposed into at least one content input sequence, and wherein determining that the spoken utterance is spoken by the registered user further comprises determining that the spoken utterance is spoken by the registered user if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker. 14. The method of claim 13, further comprising: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence. 15. A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to: receive a plurality of different spoken utterances from a user;for each of the plurality of different spoken utterances: generate a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; anddecompose the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user;calculate a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; andprovide the content-independent recognition distribution value for use in a speaker identification process. 16. The computer-readable storage medium of claim 15, wherein decomposing the respective phoneme-independent representation for each of the plurality of different spoken utterances further comprises: applying a singular value decomposition to the respective phoneme-independent representation. 17. The computer-readable storage medium of claim 15, further comprising instructions for causing the one or more processor to: generate the respective content-independent recognition unit from a singular value matrix of a singular value decomposition of the respective phoneme-independent representation for each of the plurality of different spoken utterances. 18. The computer-readable storage medium of claim 15, wherein the speaker identification process comprises: decomposing at least one spectral signature of an input speech signal into at least one content-independent characteristic unit;comparing the at least one content-independent characteristic unit to at least one of the content-independent recognition distribution values; anddetermining that the input speech signal is associated with the user if the at least one content-independent characteristic unit is within a threshold limit of the at least one of the content-independent recognition distribution values. 19. The computer-readable storage medium of claim 18, wherein decomposing the at least one spectral signature of the input speech signal into the at least one content-independent characteristic unit further comprises: applying a singular value decomposition to the at least one spectral signature of the input speech signal. 20. The computer-readable storage medium of claim 15, wherein: for each of the plurality of different spoken utterances, the respective phoneme-independent representation is decomposed to further obtain a respective content reference sequence;the at least one spectral signature of the input speech signal is further decomposed into at least one content input sequence; anddetermining that the input speech signal is associated with the user further comprises determining that the input speech signal is associated with the user if the at least one content input sequence is similar to at least one of the respective content reference sequences. 21. The computer-readable storage medium of claim 20, further comprising instructions for causing the one or more processor to: determine similarity based on a distance calculated between the at least one content input sequence and the at least one of the respective content reference sequences. 22. A non-transitory computer-readable storage medium comprising instructions for causing one or more processor to: receive a spoken utterance;generate a first phoneme-independent representation based on the spoken utterance;decompose the first phoneme-independent representation into at least one content-independent characteristic unit;compare the at least one content-independent characteristic unit to at least one-content-independent recognition distribution value associated with a registered user of a device, the at least one content-independent recognition distribution value previously generated by: generate a second phoneme-independent representation based on speech from the registered user; anddecompose the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; anddetermine that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. 23. The computer-readable storage medium of claim 22, further comprising instructions for causing the one or more processor to: generate the at least one content-independent characteristic unit from a singular value matrix of a singular value decomposition of the first phoneme-independent representation. 24. The computer-readable storage medium of claim 22, further comprising instructions for causing the one or more processor to: compute the at least one content-independent recognition distribution value from the at least one content-independent recognition unit. 25. The computer-readable storage medium of claim 24, further comprising instructions for causing the one or more processor to: generate the at least one content-independent recognition unit from a singular value matrix of a singular value decomposition of the second phoneme-independent representation. 26. The computer-readable storage medium of claim 22, wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 27. The computer-readable storage medium of claim 22, wherein the first phoneme-independent representation is further decomposed into at least one content input sequence, and wherein determining that the spoken utterance is spoken by the registered user further comprises determining that the spoken utterance is spoken by the registered user if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker. 28. The computer-readable storage medium of claim 27, further comprising instructions for causing the one or more processor to: determine similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence. 29. A system for speaker identification, comprising: one or more processors;memory; andone or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving a plurality of different spoken utterances from a user;for each of the plurality of different spoken utterances: generating a respective phoneme-independent representation from the spoken utterance, the respective phoneme-independent representation including a respective spectral signature for each of a plurality of frames sampled from the spoken utterance; anddecomposing the respective phoneme-independent representation to obtain a respective content-independent recognition unit for the user;calculating a content-independent recognition distribution value for the user based on the respective content-independent recognition units generated from the plurality of different spoken utterances; andproviding the content-independent recognition distribution value for use in a speaker identification process. 30. The system of claim 29, wherein the one or more programs further include instructions for: decomposing at least one spectral signature of an input speech signal into at least one content-independent characteristic unit;comparing the at least one content-independent characteristic unit to at least one of the content-independent recognition distribution values; anddetermining that the input speech signal is associated with the user if the at least one content-independent characteristic unit is within a threshold limit of the at least one of the content-independent recognition distribution values. 31. The system of claim 30, wherein decomposing the at least one spectral signature of the input speech signal into the at least one content-independent characteristic unit further comprises: applying a singular value decomposition to the at least one spectral signature of the input speech signal. 32. The system of claim 29, wherein: decomposing the respective phoneme-independent representation for each of the plurality of different spoken utterances further comprises decomposing the respective phoneme-independent representation to obtain a respective content reference sequence;decomposing the at least one spectral signature of the input speech signal further comprises decomposing the at least one spectral signature of the input speech signal into at least one content input sequence; anddetermining that the input speech signal is associated with the user further comprises determining that the input speech signal is associated with the user if the at least one content input sequence is similar to at least one of the respective content reference sequences. 33. The system of claim 32, wherein the one or more programs further include instructions for: determining similarity based on a distance calculated between the at least one content input sequence and the at least one of the respective content reference sequences. 34. The system of claim 29, wherein decomposing the respective phoneme-independent representation for each of the plurality of different spoken utterances further comprises: applying a singular value decomposition to the respective phoneme-independent representation. 35. The system of claim 29, wherein the one or more programs further include instructions for: generating the respective content-independent recognition unit from a singular value matrix of a singular value decomposition of the respective phoneme-independent representation for each of the plurality of different spoken utterances. 36. A system for speaker identification, comprising: one or more processors;memory; andone or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving a spoken utterance;generating a first phoneme-independent representation based on the spoken utterance;decomposing the first phoneme-independent representation into at least one content-independent characteristic unit;comparing the at least one content-independent characteristic unit to at least one content-independent recognition distribution value associated with a registered user of the device, the at least one content-independent recognition distribution value previously generated by: generating a second phoneme-independent representation based on speech from the registered user; anddecomposing the second phoneme-independent representation into a content-independent recognition unit, the at least one content-independent recognition distribution value based on the content-independent recognition unit; anddetermining that the spoken utterance is spoken by the registered user if the at least one content-independent characteristic unit is within a threshold limit of the at least one content-independent recognition distribution value. 37. The system of claim 36, wherein decomposing the first phoneme-independent representation further comprises: applying a singular value decomposition to the first phoneme-independent representation. 38. The system of claim 36, wherein decomposing the first phoneme-independent representation further comprises decomposing the first phoneme-independent representation into at least one content input sequence, and wherein determining that the spoken utterance is spoken by the registered user further comprises determining that the spoken utterance is spoken by the registered user if the at least one content input sequence is similar to at least one content reference sequence previously trained by the registered speaker. 39. The system of claim 38, wherein the one or more programs further include instructions for: determining similarity based on a distance calculated between the at least one content input sequence and the at least one content reference sequence. 40. The system of claim 36, further comprising instructions for causing the one or more processor to: generate the at least one content-independent characteristic unit from a singular value matrix of a singular value decomposition of the first phoneme-independent representation. 41. The system of claim 36, further comprising instructions for causing the one or more processor to: compute the at least one content-independent recognition distribution value from the at least one content-independent recognition unit. 42. The system of claim 41, further comprising instructions for causing the one or more processor to: generate the at least one content-independent recognition unit from a singular value matrix of a singular value decomposition of the second phoneme-independent representation.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.