Frames containing audio data may be received, the audio data having been derived from a microphone array, at least some of the frames containing residual acoustic echo after having acoustic echo partially removed therefrom. Probability distribution functions are determined from the frames of audio d
Frames containing audio data may be received, the audio data having been derived from a microphone array, at least some of the frames containing residual acoustic echo after having acoustic echo partially removed therefrom. Probability distribution functions are determined from the frames of audio data. A probability distribution function comprises likelihoods that respective directions are directions of sources of sounds. An active speaker may be identified in frames of video data based on the video data and based on audio information derived from the audio data, where use of the audio information as a basis for identifying the active speaker is controlled by determining whether the probability distribution functions indicate that corresponding audio data includes residual acoustic echo.
대표청구항▼
1. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process, the process comprising: receiving frames containing audio data, the audio data having been derived from a microphone array, at least some of the frames
1. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process, the process comprising: receiving frames containing audio data, the audio data having been derived from a microphone array, at least some of the frames containing residual acoustic echo after having acoustic echo partially removed therefrom;determining, from the frames of audio data, probability distribution functions, a probability distribution function comprising likelihoods that respective directions are directions of sources of sounds; andidentifying an active speaker in frames of video data based on the video data and based on audio information derived from the audio data, where use of the audio information as a basis for identifying the active speaker is controlled by determining whether the probability distribution functions indicate that corresponding audio data includes residual acoustic echo. 2. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 1, wherein the determining whether the probability distribution functions indicate that corresponding audio data includes residual acoustic echo comprises: identifying a plurality of local maximums of a probability distribution function. 3. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 2, the process further comprising determining whether the local maximums are substantially at pre-determined locations in the probability distribution functions. 4. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 3, the process further comprising finding a difference between a maximal local maximum and a minimal local maximum of the probability distribution function. 5. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 2, the process further comprising determining whether the identified local maximums are similar to local maximums that occur when substantially all of the sound being received by the microphone array is sound from a loudspeaker. 6. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 1, wherein the determining whether the probability distribution functions indicate that corresponding audio data includes residual acoustic echo comprises: determining whether characteristics of a probability distribution function are sufficiently similar to predetermined characteristics. 7. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 6, wherein the predetermined characteristics comprise characteristics of a probability distribution function that would occur if the microphone array was receiving sound predominantly from the loudspeaker. 8. One or more volatile and/or non-volatile physical computer readable media storing information to enable one or more devices to perform a process according to claim 1, wherein the determining whether the probability distribution functions indicate that corresponding audio data includes residual acoustic echo comprises: determining whether the probability distribution functions have local maximums near predetermined directions. 9. A method performed by one or more devices that comprise one or more processors and storage, the method comprising: receiving, in the storage, frames containing audio data, the audio data having been derived from a microphone array, at least some of the frames containing residual acoustic echo after having acoustic echo partially removed therefrom;determining by the one or more processors, from the frames of audio data, probability distribution functions, a probability distribution function comprising likelihoods that respective directions are directions of sources of sounds, and storing the probability distribution functions in the storage; andidentifying, by the one or more processors, an active speaker in frames of video data in the storage based on the video data and based on audio information derived from the audio data by the one or more processors, where use of the audio information as a basis for identifying the active speaker is controlled by the one or more processors determining whether the probability distribution functions indicate that corresponding audio data includes residual acoustic echo. 10. A method according to claim 9, further comprising determining whether characteristics of the probability distribution functions are similar to characteristics of a probability distribution function that corresponds to the microphone array primarily receiving sound from a loudspeaker. 11. A method according to claim 10, further comprising: receiving frames of audio data from a far-end source and using the frames to produce sound with a loudspeaker co-located with the microphone array, where the sound received at the microphone includes the sound produced with the loudspeaker;generating audio frames from the sound received at the microphone array, performing echo cancellation on the audio frames, wherein the probability distribution functions are computed from the audio frames after the echo cancellation; andallowing a probability distribution function to be used in the active speaker detection process when characteristics of the probability distribution function are determined to be not similar to characteristics of a probability distribution function that corresponds to the microphone array primarily receiving sound from a loudspeaker. 12. A method according to claim 9, further comprising identifying and analyzing local maximums of the probability distribution functions. 13. A method according to claim 12, wherein the analyzing the local maximums comprises comparing them to direction(s) of one or more loudspeakers. 14. A method according to claim 13, wherein the analyzing further comprises identifying a maximal local maximum and a minimal local maximum. 15. A method according to claim 14, further comprising subtracting the magnitude of the minimal local maximum from the magnitude of a maximal local maximum and dividing by the magnitude of the minimal local maximum. 16. A method according to claim 15, further comprising subtracting the magnitude of the minimal local maximum from the magnitude of a maximal local maximum and dividing by the magnitude of the minimal local maximum. 17. A method according to claim 9, further comprising analyzing the probability distribution functions to determine whether the probability functions are to be used to detect an active speaker.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (30)
Yoo, Jae Ha, Acoustic echo control system and double talk control method thereof.
Addeo Eric J. (Long Valley NJ) Desmarias Joseph J. (Morris Plains NJ) Shtirmer Gennady (Morris Plains NJ), Audio processing system for teleconferencing system.
Miller William J. (N. Miami FL) Chiu Ran F. (Los Altos CA) Joerger Richard B. (Pembroke Pines FL) Newdeck Frank W. (Hatboro PA), Digital voice compression having a digitally controlled AGC circuit and means for including the true gain in the compres.
Velardo ; Jr. Patrick M. (Belleville NJ) Wynn Woodson D. (Basking Ridge NJ), Method and apparatus for reducing residual far-end echo in voice communication networks.
Martinez Tony R. ; Moncur R. Brian ; Shepherd D. Lynn ; Parr Randall J. ; Wilson D. Randall ; Hansen Carl Hal, Method and apparatus for signal classification using a multilayer network.
Greg C. Burnett ; John F. Holzrichter ; Lawrence C. Ng, System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech.
Etter,Walter, Systems and methods for far-end noise reduction and near-end noise compensation in a mixed time-frequency domain compander to improve signal quality in communications systems.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.