Method and apparatus for detecting synthesized speech
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-017/22
G10L-025/51
G10L-017/26
출원번호
US-0012081
(2013-08-28)
등록번호
US-9484036
(2016-11-01)
발명자
/ 주소
Kons, Zvi
Aronowitz, Hagai
Shechtman, Slava
출원인 / 주소
Nuance Communications, Inc.
대리인 / 주소
Hamilton, Brook, Smith & Reynolds, P.C.
인용정보
피인용 횟수 :
0인용 특허 :
7
초록▼
Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method an
Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.
대표청구항▼
1. A method for a speaker verification system to prevent malicious attacks on protected resources, comprising: extracting a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice cha
1. A method for a speaker verification system to prevent malicious attacks on protected resources, comprising: extracting a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice characteristics of a user, the speech signal enabling access to protected resources by a requestor;analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior, the analyzing comparing the plurality of speech features at edges of speech frames with the plurality of speech features in middles of the speech frames, the analyzing including: generating a representation of variability of the plurality of speech features; andperforming a periodicity analysis of the generated representation of variability to determine that the generated representation of variability exhibits periodic variation behavior; anddenying the requestor access to the protected resources based on a determination that the speech signal is a synthetic speech signal based on the exhibited periodic variation behavior. 2. A method according to claim 1, wherein extracting the plurality of speech features includes calculating vocal tract transfer function parameters associated with the speech signal. 3. A method according to claim 1, wherein the plurality of speech features includes determining a pitch cycle length or shape associated with the speech signal. 4. A method according to claim 1 further comprising employing a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in verifying or identifying a speaker associated with the speech signal. 5. A method according to claim 1 further comprising employing a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in determining whether to grant access, to a computer system, to a user associated with the speech signal. 6. A method according to claim 1, wherein the representation of variability of the plurality of speech features includes one of: a ratio of correlation values associated with adjacent pitch cycles of the speech signal; ora distance function illustrating difference between vocal tract transfer function parameters associated with adjacent segments of the multiple segments of the speech signal. 7. A method according to claim 1, wherein performing a periodicity analysis of the representation of variability generated includes calculating a periodicity function based on values of the representation of variability generated. 8. A method according to claim 7 further including determining whether the periodicity function exhibits a peak at a point corresponding to a potential speech frame length. 9. A method according to claim 8, wherein determining whether the periodicity function exhibits a peak at a point corresponding to a potential speech frame length includes comparing a value of the periodicity function to a threshold value. 10. An apparatus of a speaker verification system for preventing malicious attacks on protected resources, comprising: at least one processor; andat least one memory storing computer code instructions thereon,the at least one processor and the at least one memory, with the computer code instructions, being configured to cause the apparatus to: extract a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice characteristics of a user, the speech signal enabling access to protected resources by a requestor;analyze the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior, the analyzing comparing the plurality of speech features at edges of speech frames with the plurality of speech features in middles of the speech frames, wherein, to perform the analyzing, the at least one processor and the at least one memory, with the computer code instructions, are further configured to cause the apparatus to: generate a representation of variability of the plurality of speech features; andperform a periodicity analysis of the generated representation of variability to determine that the generated representation of variability exhibits periodic variation behavior; anddeny the requestor access to the protected resources based on a determination that the speech signal is a synthetic speech signal based on the exhibited periodic variation behavior. 11. An apparatus according to claim 10, wherein in extracting the plurality of speech features, the at least one processor and the at least one memory, with the computer code instructions, are configured to cause the apparatus to calculate vocal tract transfer function parameters associated with the speech signal. 12. An apparatus according to claim 10, wherein in extracting the plurality of speech features, the at least one processor and the at least one memory, with the computer code instructions, are configured to cause the apparatus to determine a pitch cycle length or shape associated with the speech signal. 13. An apparatus according to claim 10, wherein the at least one processor and the at least one memory, with the computer code instructions, are configured to further cause the apparatus to employ a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in verifying or identifying a speaker associated with the speech signal. 14. An apparatus according to claim 10, wherein the at least one processor and the at least one memory, with the computer code instructions, are configured to further cause the apparatus to employ a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in determining whether to grant access, to a computer system, to a user associated with the speech signal. 15. An apparatus according to claim 10, wherein the representation of variability of the plurality of speech features includes one of: a ratio of correlation values associated with adjacent pitch cycles of the speech signal; ora distance function illustrating difference between vocal tract transfer function parameters associated with adjacent segments of the multiple segments of the speech signal. 16. An apparatus according to claim 10, wherein in performing a periodicity analysis of the representation of variability generated, the at least one processor and the at least one memory, with the computer code instructions, are configured to cause the apparatus to calculate a periodicity function based on values of the representation of variability generated. 17. An apparatus according to claim 16, wherein the at least one processor and the at least one memory, with the computer code instructions, are configured to further cause the apparatus to determine whether the periodicity function exhibits a peak at a point corresponding to a potential speech frame length. 18. A non-transitory computer-readable medium with computer code instructions stored thereon for a speaker verification system to prevent malicious attacks on protected resources, the computer code instructions when executed by a processor cause an apparatus to: extract a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice characteristics of a user, the speech signal enabling access to protected resources by a requestor;analyze the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior, the analyzing comparing the plurality of speech features at edges of speech frames with the plurality of speech features in middles of the speech frames, wherein, to perform the analyzing, the computer code instructions, when executed by the processor, further cause the apparatus to: generate a representation of variability of the plurality of speech features; andperform a periodicity analysis of the generated representation of variability to determine that the generated representation of variability exhibits periodic variation behavior; anddeny the requestor access to the protected resources based on a determination that the speech signal is a synthetic speech signal based on the exhibited periodic variation behavior.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (7)
Qian, Yao; Soong, Frank Kao-Ping, Frame mapping approach for cross-lingual voice transformation.
Macleod Iain Donald Graham,AUX ; Millar John Bruce,AUX ; Chen Fangxin,CAX ; Laverty William,CAX, Method for forming a cohort for use in identification of an individual.
Stifelman, Lisa J.; Partovi, Hadi; Partovi, Haleh; Alpert, David Bryan; Marx, Matthew Talin; Bailey, Scott James; Sims, Kyle D.; Bailey, Darby McDonough; Brathwaite, Roderick Steven; Koh, Eugene; Davis, Angus Macdonald, Providing menu and other services for an information processing system using a telephone or other audio interface.
Stifelman,Lisa Joy; Partovi,Hadi; Partovi,Haleh; Alpert,David Bryan; Marx,Matthew Talin; Bailey,Scott James; Sims,Kyle D.; Bailey,Darby McDonough; Brathwaite,Roderick Steven; Koh,Eugene; Davis,Angus , Providing menu and other services for an information processing system using a telephone or other audio interface.
Stifelman,Lisa Joy; Partovi,Hadi; Partovi,Haleh; Alpert,David Bryan; Marx,Matthew Talin; Bailey,Scott James; Sims,Kyle D.; Bailey,Darby McDonough; Brathwaite,Roderick Steven; Koh,Eugene; Davis,Angus Macdonald, Providing services for an information processing system using an audio interface.
Kennedy Paul Roy ; Hall Timothy Gerard ; Yip William Chunhung, Radio telecommunication device and method of authenticating a user with a voice authentication token.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.