Monaural noise suppression based on computational auditory scene analysis
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-021/0208
G10L-021/0272
출원번호
US-0859186
(2013-04-09)
등록번호
US-9431023
(2016-08-30)
발명자
/ 주소
Avendano, Carlos
Laroche, Jean
Goodwin, Michael M.
Solbach, Ludger
출원인 / 주소
Knowles Electronics, LLC
대리인 / 주소
Carr & Ferrell LLP
인용정보
피인용 횟수 :
3인용 특허 :
120
초록▼
The present technology provides a robust noise suppression system that may concurrently reduce noise and echo components in an acoustic signal while limiting the level of speech distortion. A time-domain acoustic signal may be received and be transformed to frequency-domain sub-band signals. Feature
The present technology provides a robust noise suppression system that may concurrently reduce noise and echo components in an acoustic signal while limiting the level of speech distortion. A time-domain acoustic signal may be received and be transformed to frequency-domain sub-band signals. Features, such as pitch, may be identified and tracked within the sub-band signals. Initial speech and noise models may be then be estimated at least in part from a probability analysis based on the tracked pitch sources. Speech and noise models may be resolved from the initial speech and noise models and noise reduction may be performed on the sub-band signals. An acoustic signal may be reconstructed from the noise-reduced sub-band signals.
대표청구항▼
1. A method for performing noise reduction, the method comprising: executing a program stored in a memory to transform a time-domain acoustic signal into a plurality of frequency-domain sub-band signals;tracking at least one pitch from a plurality of pitch sources within a frequency-domain sub-band
1. A method for performing noise reduction, the method comprising: executing a program stored in a memory to transform a time-domain acoustic signal into a plurality of frequency-domain sub-band signals;tracking at least one pitch from a plurality of pitch sources within a frequency-domain sub-band signal in the plurality of frequency-domain sub-band signals, wherein the tracking includes: calculating at least one feature for each of the plurality of pitch sources; anddetermining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker;generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; andperforming the noise reduction on the frequency-domain sub-band signal based on the speech model and the one or more noise models. 2. The method of claim 1, wherein the tracking includes tracking the at least one pitch across successive frames of the frequency-domain sub-band signal. 3. The method of claim 1, wherein the generating a speech model and one or more noise models is based on at least two tracked pitches from the plurality of pitch sources. 4. The method of claim 1, wherein the generating a speech model and one or more noise models includes combining the multiple models. 5. The method of claim 1, wherein at least one of the one or more noise models is at least one of: not updated for a sub-band in a current frame when speech is dominant in the previous frame; andnot updated in the current frame when speech is dominant in the current frame for the sub-band. 6. The method of claim 1, wherein the noise reduction is performed using an optimal filter. 7. The method of claim 6, wherein the optimal filter is based on a least squares formulation. 8. The method of claim 1, wherein the one or more noise models model undesired speech. 9. A system for performing noise reduction in an audio signal, the system comprising: a memory;an analysis module stored in the memory and executed by a processor to transform a time-domain acoustic to frequency-domain sub-band signals;a source inference engine stored in the memory and executed by the processor to track at least one pitch from a plurality of pitch sources within the frequency-domain sub-band signals and to generate a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech, wherein the tracking includes: calculating at least one feature for each of the plurality of pitch sources; anddetermining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker; anda modifier module stored in the memory and executed by the processor to perform the noise reduction on the frequency-domain sub-band signals based on the speech model and the one or more noise models. 10. The system of claim 9, wherein the source inference engine is executable to generate a speech model and one or more noise models based on at least two tracked pitches from the plurality of pitch sources. 11. The system of claim 9, wherein the source inference engine is executable to at least one of: not update at least one of the one or more noise models for a sub-band in a current frame when speech is dominant in the previous frame; andnot update at least one of the one or more noise models for the sub-band in the current frame when speech is dominant in the current frame for the sub-band. 12. The system of claim 9, wherein a modifier module is executable to apply a first-order filter to each sub-band in each frame. 13. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for reducing noise in an audio signal, the method comprising: transforming an acoustic signal from a time-domain signal to frequency-domain sub-band signals;tracking at least one pitch from a plurality of pitch sources within the frequency-domain sub-band signals, the tracking including: calculating at least one feature for each of the plurality of pitch sources; anddetermining, based on the at least one feature, a probability based at least in part on pitch energy level, pitch salience, and pitch stationarity for each pitch source that the pitch source is a desired speech source, the desired speech source being a speech source associated with a desired talker;generating a speech model and one or more noise models based on the tracked at least one pitch, the speech model modeling desired speech and the one or more noise models modeling sources other than the desired speech; andperforming noise reduction on the frequency-domain sub-band signals based on the speech model and one or more noise models. 14. The non-transitory computer readable storage medium of claim 13, wherein the tracking includes tracking the at least one pitch across successive frames of the frequency-domain sub-band signals. 15. The non-transitory computer readable storage medium of claim 13, wherein at least one of: a respective one of the one or more noise models is not updated for a sub-band in a current frame when speech is dominant in the previous frame for the sub-band; andthe respective one of the one or more noise models is not updated for a sub-band in a current frame when speech is dominant in the current frame for the sub-band. 16. The non-transitory computer readable storage medium of claim 13, wherein performing the noise reduction includes applying a first-order filter to each sub-band signal.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (120)
Quatieri, Jr., Thomas F., 2-D processing of speech.
Van Veen Barry D. ; Leblond Olivier E.,FRX ; Sebald Daniel J., Adaptive acoustic attenuation system having distributed processing and shared state nodal architecture.
Ellis, Richard Thompson; Yoo, Heejong; Graham, David Wilson; Hasler, Paul Edward; Anderson, David V., Analog audio signal enhancement system using a noise suppression algorithm.
Gong Xue-Mei ; Paulos John James ; Alexander Mark ; Gaalaas Eric ; Hester Dylan, Circuits, systems and methods for processing data in a one-bit format.
Dattorro Jon C. (2 Beaumont La. Devon PA 19333) Charpentier Albert J. (25 Cool Valley Rd. Malvern PA 19355) Andreas David C. (1407 Henry Dr. Downingtown PA 19335), Decimation filter as for a sigma-delta analog-to-digital converter.
Roland Kuhn ; Patrick Nguyen ; Jean-Claude Junqua, Dimensionality reduction for speaker normalization and speaker and environment adaptation using eigenvoice techniques.
Cooper Duane H. (918 W. Daniel St. Champaign IL 61821) Bauck Jerald L. (1007 W. Clark ; #4 Urbana IL 61801), Head diffraction compensated stereo system with optimal equalization.
Wu, Jian; Droppo, James G.; Deng, Li; Acero, Alejandro, Method and apparatus for constructing a speech filter using estimates of clean speech and noise.
Kushner,William M.; Harton,Sara M.; Jasiuk,Mark A., Method and apparatus for equalizing a speech signal generated within a pressurized air delivery system.
Lane John E. (Satellite Beach FL) Hoory Dan (Austin TX) Brewer ; Jr. Phillip K. (Buda TX), Method and apparatus for generating decoupled filter parameters and implementing a band decoupled filter.
Hatanaka Mitsuyuki,JPX ; Oikawa Yoshiaki,JPX ; Tsutsui Kyoya,JPX, Methods and apparatus for gain controlling waveform elements ahead of an attack portion and waveform elements of a relea.
Vilmur Richard J. (Palatine IL) Barlo Joseph J. (Hoffman Estates IL) Gerson Ira A. (Hoffman Estates IL) Lindsley Brett L. (Palatine IL), Noise suppression system.
Zurek, Robert A.; Axelrod, Jeffrey M.; Clark, Joel A.; Francois, Holly L.; Isabelle, Scott K.; Pearce, David J.; Rex, James A., Robust two microphone noise suppression system.
Doyle James T. (Chandler AZ) Beatty Tim (Mesa AZ) Liepold Carl F. (Mesa AZ), Second order Sigma-Delta based analog to digital converter having superior analog components and having a programmable c.
Brennan,Robert L.; Tam,King; Nadjar,Hamid Sheikhzadeh; Schneider,Todd; Hermann,David, Sub-band adaptive signal processing in an oversampled filterbank.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.