[특허]Method and apparatus for detecting synthesized speech

Method and apparatus for detecting synthesized speech 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-017/22 G10L-025/51 G10L-017/26
출원번호	US-0012081 (2013-08-28)
등록번호	US-9484036 (2016-11-01)
발명자 / 주소	Kons, Zvi Aronowitz, Hagai Shechtman, Slava
출원인 / 주소	Nuance Communications, Inc.
대리인 / 주소	Hamilton, Brook, Smith & Reynolds, P.C.
인용정보	피인용 횟수 : 0 인용 특허 : 7

초록 ▼

Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.

대표청구항 ▼

1. A method for a speaker verification system to prevent malicious attacks on protected resources, comprising: extracting a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice characteristics of a user, the speech signal enabling access to protected resources by a requestor;analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior, the analyzing comparing the plurality of speech features at edges of speech frames with the plurality of speech features in middles of the speech frames, the analyzing including: generating a representation of variability of the plurality of speech features; andperforming a periodicity analysis of the generated representation of variability to determine that the generated representation of variability exhibits periodic variation behavior; anddenying the requestor access to the protected resources based on a determination that the speech signal is a synthetic speech signal based on the exhibited periodic variation behavior. 2. A method according to claim 1, wherein extracting the plurality of speech features includes calculating vocal tract transfer function parameters associated with the speech signal. 3. A method according to claim 1, wherein the plurality of speech features includes determining a pitch cycle length or shape associated with the speech signal. 4. A method according to claim 1 further comprising employing a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in verifying or identifying a speaker associated with the speech signal. 5. A method according to claim 1 further comprising employing a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in determining whether to grant access, to a computer system, to a user associated with the speech signal. 6. A method according to claim 1, wherein the representation of variability of the plurality of speech features includes one of: a ratio of correlation values associated with adjacent pitch cycles of the speech signal; ora distance function illustrating difference between vocal tract transfer function parameters associated with adjacent segments of the multiple segments of the speech signal. 7. A method according to claim 1, wherein performing a periodicity analysis of the representation of variability generated includes calculating a periodicity function based on values of the representation of variability generated. 8. A method according to claim 7 further including determining whether the periodicity function exhibits a peak at a point corresponding to a potential speech frame length. 9. A method according to claim 8, wherein determining whether the periodicity function exhibits a peak at a point corresponding to a potential speech frame length includes comparing a value of the periodicity function to a threshold value. 10. An apparatus of a speaker verification system for preventing malicious attacks on protected resources, comprising: at least one processor; andat least one memory storing computer code instructions thereon,the at least one processor and the at least one memory, with the computer code instructions, being configured to cause the apparatus to: extract a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice characteristics of a user, the speech signal enabling access to protected resources by a requestor;analyze the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior, the analyzing comparing the plurality of speech features at edges of speech frames with the plurality of speech features in middles of the speech frames, wherein, to perform the analyzing, the at least one processor and the at least one memory, with the computer code instructions, are further configured to cause the apparatus to: generate a representation of variability of the plurality of speech features; andperform a periodicity analysis of the generated representation of variability to determine that the generated representation of variability exhibits periodic variation behavior; anddeny the requestor access to the protected resources based on a determination that the speech signal is a synthetic speech signal based on the exhibited periodic variation behavior. 11. An apparatus according to claim 10, wherein in extracting the plurality of speech features, the at least one processor and the at least one memory, with the computer code instructions, are configured to cause the apparatus to calculate vocal tract transfer function parameters associated with the speech signal. 12. An apparatus according to claim 10, wherein in extracting the plurality of speech features, the at least one processor and the at least one memory, with the computer code instructions, are configured to cause the apparatus to determine a pitch cycle length or shape associated with the speech signal. 13. An apparatus according to claim 10, wherein the at least one processor and the at least one memory, with the computer code instructions, are configured to further cause the apparatus to employ a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in verifying or identifying a speaker associated with the speech signal. 14. An apparatus according to claim 10, wherein the at least one processor and the at least one memory, with the computer code instructions, are configured to further cause the apparatus to employ a result of a determination whether the speech signal is a synthetic speech signal or a natural speech signal in determining whether to grant access, to a computer system, to a user associated with the speech signal. 15. An apparatus according to claim 10, wherein the representation of variability of the plurality of speech features includes one of: a ratio of correlation values associated with adjacent pitch cycles of the speech signal; ora distance function illustrating difference between vocal tract transfer function parameters associated with adjacent segments of the multiple segments of the speech signal. 16. An apparatus according to claim 10, wherein in performing a periodicity analysis of the representation of variability generated, the at least one processor and the at least one memory, with the computer code instructions, are configured to cause the apparatus to calculate a periodicity function based on values of the representation of variability generated. 17. An apparatus according to claim 16, wherein the at least one processor and the at least one memory, with the computer code instructions, are configured to further cause the apparatus to determine whether the periodicity function exhibits a peak at a point corresponding to a potential speech frame length. 18. A non-transitory computer-readable medium with computer code instructions stored thereon for a speaker verification system to prevent malicious attacks on protected resources, the computer code instructions when executed by a processor cause an apparatus to: extract a plurality of speech features from multiple segments of a speech signal, the speech signal being a synthetic speech signal generated by transforming voice or text with voice characteristics of a user, the speech signal enabling access to protected resources by a requestor;analyze the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior, the analyzing comparing the plurality of speech features at edges of speech frames with the plurality of speech features in middles of the speech frames, wherein, to perform the analyzing, the computer code instructions, when executed by the processor, further cause the apparatus to: generate a representation of variability of the plurality of speech features; andperform a periodicity analysis of the generated representation of variability to determine that the generated representation of variability exhibits periodic variation behavior; anddeny the requestor access to the protected resources based on a determination that the speech signal is a synthetic speech signal based on the exhibited periodic variation behavior.

이 특허에 인용된 특허 (7)

Qian, Yao; Soong, Frank Kao-Ping, Frame mapping approach for cross-lingual voice transformation.
상세보기
Macleod Iain Donald Graham,AUX ; Millar John Bruce,AUX ; Chen Fangxin,CAX ; Laverty William,CAX, Method for forming a cohort for use in identification of an individual.
상세보기
Stifelman, Lisa J.; Partovi, Hadi; Partovi, Haleh; Alpert, David Bryan; Marx, Matthew Talin; Bailey, Scott James; Sims, Kyle D.; Bailey, Darby McDonough; Brathwaite, Roderick Steven; Koh, Eugene; Davis, Angus Macdonald, Providing menu and other services for an information processing system using a telephone or other audio interface.
상세보기
Stifelman,Lisa Joy; Partovi,Hadi; Partovi,Haleh; Alpert,David Bryan; Marx,Matthew Talin; Bailey,Scott James; Sims,Kyle D.; Bailey,Darby McDonough; Brathwaite,Roderick Steven; Koh,Eugene; Davis,Angus , Providing menu and other services for an information processing system using a telephone or other audio interface.
상세보기
Stifelman,Lisa Joy; Partovi,Hadi; Partovi,Haleh; Alpert,David Bryan; Marx,Matthew Talin; Bailey,Scott James; Sims,Kyle D.; Bailey,Darby McDonough; Brathwaite,Roderick Steven; Koh,Eugene; Davis,Angus Macdonald, Providing services for an information processing system using an audio interface.
상세보기
Kennedy Paul Roy ; Hall Timothy Gerard ; Yip William Chunhung, Radio telecommunication device and method of authenticating a user with a voice authentication token.
상세보기
Kato, Yumiko; Kamai, Takahiro, Strained-rough-voice conversion device, voice conversion device, voice synthesis device, voice conversion method, voice synthesis method, and program.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Method and apparatus for detecting synthesized speech 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (7)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Method and apparatus for detecting synthesized speech 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (7)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트