[특허]Voice activity decision base on zero crossing rate and spectral sub-band energy

Voice activity decision base on zero crossing rate and spectral sub-band energy 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-011/06 G10L-021/02 G10L-015/20 G10L-017/00
출원번호	US-0307683 (2011-11-30)
등록번호	US-8296133 (2012-10-23)
우선권정보	CN-2009 1 0206840 (2009-10-15)
발명자 / 주소	Wang, Zhe
출원인 / 주소	Huawei Technologies Co., Ltd.
대리인 / 주소	Conley Rose, P.C.
인용정보	피인용 횟수 : 2 인용 특허 : 8

초록 ▼

A voice activity detection method and apparatus, and an electronic device are provided. The method includes: obtaining a time domain parameter and a frequency domain parameter from an audio frame; obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame, and obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; and judging whether the audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance and a set of decision inequalities based on the first distance and the second distance. The above technical solutions enable the judgment criterion to have an adaptive adjustment capability, thus improving the performance of the voice activity detection.

대표청구항 ▼

1. A voice activity detection method, comprising: obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; andjudging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal,wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame,wherein obtaining the signal-to-noise ratio of the audio frame comprises:obtaining a signal-to-noise ratio of each sub-band according to a ratio of the spectral sub-band energy to the long-term sliding mean of the spectral sub-band energy in the history background noise frame;performing linear processing or nonlinear processing on the signal-to-noise ratio of each sub-band; andsumming the signal-to-noise ratio of each sub-band after the processing to obtain the signal-to-noise ratio of the audio frame, wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises determining the signal-to-noise ratio of each sub-band after the nonlinear processing according to MAX⁡(fi·10·log⁡(EiEi_),0), and wherein, i=0, . . . , the number of sub-bands minus one, fi={MIN⁡(Ei2/64,1)when⁢⁢x⁢⁢1≤i≤x⁢⁢2MIN⁡(Ei2/25,1)when⁢⁢i⁢⁢is⁢⁢other⁢⁢values, i is other values means that i is a numerical value from zero to the number of sub-bands minus one except the value range from x1 to x2, x1 and x2 are greater than zero and smaller than the number of sub-bands minus one, values of x1 and x2 are determined according to key sub-bands in all the sub-bands, Ei is a current value of the long-term sliding mean of the spectral sub-band energy in the history background noise frame, and Ei is the spectral sub-band energy of the audio frame. 2. The method according to claim 1, wherein performing the linear processing on the signal-to-noise ratio of each sub-band comprises performing linear processing on the signal-to-noise ratio of each sub-band, and wherein performing the nonlinear processing on the signal-to-noise ratio of each sub-band comprises performing either the same nonlinear processing or different nonlinear processing on the signal-to-noise ratio of each sub-band. 3. A voice activity detection method, comprising: obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; andjudging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal, wherein the set of decision inequalities comprises MSSNR≧a·DZCR+b and MSSNR≧(−c)·DZCR+d and wherein a and C are coefficients, b and d are constants, MSSNR is obtained according to the first distance, and DZCR is obtained according to the second distance. 4. The method according to claim 3, wherein if the audio frame is judged to be the background noise frame, then the long-term sliding mean of the time domain parameter in the history background noise frame is updated according to the time domain parameter of the audio frame and the long-term sliding mean of the frequency domain parameter in the history background noise frame is updated according to the frequency domain parameter of the audio frame. 5. The method according to claim 3, wherein the time domain parameter is a zero-crossing rate, and wherein the first distance between the time domain parameter and the long-term sliding mean of the time domain parameter in the history background noise frame is a Differential Zero-Crossing rate (DZC). 6. The method according to claim 5, wherein if the audio frame is judged to be the background noise frame, then the long-term sliding mean of the zero-crossing rate in the history background noise frame is updated to α· ZCR+(1−α)·ZCR, and wherein α is an update speed control parameter, ZCR is a current value of the long-term sliding mean of the zero-crossing rate in the history background noise frame, and ZCR is a zero-crossing rate of the audio frame. 7. The method according to claim 3, wherein judging whether the current audio frame is the foreground voice frame or the background noise frame according to the first distance, the second distance, and the set of decision inequalities based on the first distance and the second distance comprises: judging that the current audio frame is the foreground voice frame if the first distance and the second distance satisfy any one decision inequality in the set of decision inequalities; andjudging that the audio frame is the background noise frame if the first distance and the second distance satisfy no decision inequality in the set of decision inequalities. 8. The method according to claim 3, wherein determining the variable according to the voice activity detection operation mode or the features of the input signal comprises determining the variable according to one or more of: the voice activity detection operation point, the signal long-term signal-to-noise ratio, the background noise fluctuation degree, and the background noise level, and wherein the voice activity detection operation mode comprises a voice activity detection operation point, and the features of the input signal comprise one or more of: a signal long-term signal-to-noise ratio, a background noise fluctuation degree, and a background noise level. 9. A voice activity detection method, comprising: obtaining a time domain parameter and a frequency domain parameter from a current audio frame to be detected;obtaining a first distance between the time domain parameter and a long-term sliding mean of the time domain parameter in a history background noise frame;obtaining a second distance between the frequency domain parameter and a long-term sliding mean of the frequency domain parameter in the history background noise frame; andjudging whether the current audio frame is a foreground voice frame or a background noise frame according to the first distance, the second distance, and a set of decision inequalities based on the first distance and the second distance,wherein at least one coefficient in the set of decision inequalities is a variable determined according to a voice activity detection operation mode or features of an input signal,wherein the frequency domain parameter indicates spectral sub-band energy, and wherein the second distance between the frequency domain parameter and the long-term sliding mean of the frequency domain parameter in the history background noise frame is a signal-to-noise ratio of the audio frame, wherein the set of decision inequalities comprises MSSNR≧a·DZCR+b and MSSNR≧(−c)·DZCR+d, and wherein a and c are coefficients, b and d are constants, MSSNR is a corrected distance between the spectral sub-band energy and the long-term sliding mean of the spectral sub-band energy in the history background noise frame, and DZCR is a distance between the zero-crossing rate and the long-term sliding mean of the zero-crossing rate in the history background noise frame. 10. The method according to claim 9, wherein if the audio frame is judged to be the background noise frame, then the long-term sliding mean of the spectral sub-band energy in the history background noise frame is updated to β· Ei+(1−β)·Ei, and wherein i=0, . . . N, N is the number of sub-bands minus one, β is an update speed control parameter, Ei is a current value of the long-term sliding mean of the spectral sub-band energy in the history background noise frame, and Ei is the spectral sub-band energy of the audio frame.

이 특허에 인용된 특허 (8)

Mozer, Forrest S.; Savoie, Robert E.; Teasley, William T., Audio recognition peripheral system.
상세보기
Walker Mark R. ; Kidder Jeffrey ; Keith Michael, Encoding audio signals using precomputed silence.
상세보기
Benyassine Adil ; Shlomot Eyal, Method and apparatus for generating frame voicing decisions of an incoming speech signal.
상세보기
Lubiarz,St챕phane; Hinard,Edouard; Capman,Fran챌ois; Lockwood,Philip, Method and device for detecting voice activity.
상세보기
Sonnic Estelle,FRX, Method and device for detecting voice activity.
상세보기
Chen, Bing; James, James H., Operating method for voice activity detection/silence suppression system.
상세보기
Bou Ghazale,Sahar E.; Asadi,Ayman O.; Assaleh,Khaled, System and method for a endpoint detection of speech for improved speech recognition in noisy environments.
상세보기
Li,Dunling, Voice activity identiftication for speaker tracking in a packet based conferencing system with distributed processing.
상세보기

이 특허를 인용한 특허 (2)

Secker-Walker, Hugh Evan; Basye, Kenneth John; Strom, Nikko; Thomas, Ryan Paul, Distributed endpointing for speech recognition.
상세보기
Femal, Michael J., Methods and apparatus for reducing audio conference noise using voice quality measures.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Voice activity decision base on zero crossing rate and spectral sub-band energy 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (8)

이 특허를 인용한 특허 (2)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Voice activity decision base on zero crossing rate and spectral sub-band energy 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (8)

이 특허를 인용한 특허 (2)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트