[특허]Acoustic echo cancellation using visual cues

Acoustic echo cancellation using visual cues 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06K-009/00 G10L-025/78 H04N-021/442 H04B-003/23
출원번호	US-0535232 (2012-06-27)
등록번호	US-9767828 (2017-09-19)
발명자 / 주소	Velusamy, Kavitha Chu, Wai C. Gopalan, Ramya Chhetri, Amit S.
출원인 / 주소	Amazon Technologies, Inc.
대리인 / 주소	Lee & Hayes, PLLC
인용정보	피인용 횟수 : 2 인용 특허 : 7

초록 ▼

Techniques for enhancing an acoustic echo canceller based on visual cues are described herein. The techniques include changing adaptation of a filter of the acoustic echo canceller, calibrating the filter, or reducing background noise from an audio signal processed by the acoustic echo canceller. The changing, calibrating, and reducing are responsive to visual cues that describe acoustic characteristics of a location of a device that includes the acoustic echo canceller. Such visual cues may indicate that no human being is present at the location, that some subject(s) are engaged in speaking or sound generating activities, or that motion associated with an echo path change has occurred at the location.

대표청구항 ▼

1. A computer-implemented method comprising: ascertaining, from one or more images of a location, that a person at the location is speaking;detecting, by at least one of a double-talk detector of an acoustic echo processor or by a voice activity detector of the acoustic echo processor, that an audio signal associated with a voice is generated by a microphone at the location;determining, by the acoustic echo processor, a confidence score indicating a likelihood that the audio signal is associated with the person at the location;adjusting, by the acoustic echo processor, the confidence score based at least in part on the one or more images depicting the person at the location engaged in speaking;determining that the confidence score exceeds a threshold;changing, by the acoustic echo processor, adaptation of a filter of the acoustic echo processor based at least in part on the confidence score exceeding the threshold and the detecting the audio signal associated with the voice; andremoving, at least in part, background noise from the audio signal based at least in part on the confidence score exceeding the threshold, wherein an amount of the background noise removed from the audio signal is based at least in part on a known echo path of the location. 2. The method of claim 1, wherein the detecting that the audio signal associated with the voice into the microphone is performed by the double-talk detector and the method further comprises: receiving, by the double-talk detector, an indication that the one or more images of the location depict the person at the location engaged in speaking, andadjusting the confidence score based on the indication. 3. The method of claim 1, wherein the detecting that the audio signal associated with the voice into the microphone is performed by the voice activity detector and the method further comprises: receiving, by the voice activity detector, an indication that the one or more images of the location depict the person at the location engaged in speaking, andadjusting the confidence score based on the indication. 4. The method of claim 1, wherein the method further comprises: determining, based at least in part on the one or more images, that an item at the location has changed position;determining that the change in position of the item is associated with a corresponding change in the known echo path; andaccelerating the adaptation of the filter based at least in part on determining that the change in position of the item is associated with the corresponding change in the known echo path. 5. The method of claim 1, wherein the ascertaining comprises determining if the one or more images of the location show movement of lips of the person in a specified time period. 6. The method of claim 1, wherein the method further comprises: determining that the location does not include people;capturing audio at the location with the microphone based at least in part on determining that the location does not include people; anddetermining the known echo path based at least in part on the audio captured at the location. 7. The method of claim 1, wherein the changing comprises halting or slowing the adaptation of the filter based at least in part on a determination that the confidence score exceeds the threshold. 8. The method of claim 1, wherein the method further comprises: determining that the audio signal associated with the voice is no longer detected by the microphone; andresuming the adaptation of the filter. 9. The method of claim 1, wherein the method further comprises removing, at least in part, an acoustic echo from the audio signal, wherein an amount of the acoustic echo removed from the audio signal is based at least in part on the known echo path. 10. One or more non-transitory computer-readable media having computer-executable instructions stored thereon and configured to program a computing device to perform operations comprising: ascertaining, from one or more images of a location, that a person at the location is speaking;capturing an audio signal by a microphone at the location;detecting that the audio signal is associated with a human voice;determining a first confidence score based at least in part on a first indication that the human voice is associated with the person at the location;determining a second confidence score based at least in part on a second indication that the one or more images depict the person at the location engaged in speaking, the second confidence score being greater than the first confidence score;changing adaptation of a filter of an acoustic echo processor based at least in part on at least one of the first confidence score or the second confidence score and the audio signal; andremoving, at least in part, background noise from the audio signal based at least in part on the at least one of the first confidence score or the second confidence score, wherein an amount of the background noise removed from the audio signal is based at least in part on a known echo path of the location. 11. The non-transitory computer-readable media of claim 10, wherein the detecting that the audio signal is the human voice is performed by at least one of a double-talk detector of the acoustic echo processor or by a voice activity detector of the acoustic echo processor. 12. The non-transitory computer-readable media of claim 11, wherein the detecting that the audio signal is the human voice is performed by the double-talk detector and the operations further comprise: receiving, by the double-talk detector, an indication that the one or more images of the location depict the person at the location engaged in speaking, anddetermining the second confidence score based at least in part on the indication. 13. The non-transitory computer-readable media of claim 11, wherein the detecting that the audio signal is the human voice is performed by the voice activity detector and the operations further comprise: receiving, by the voice activity detector, an indication that the one or more images of the location depict the person at the location engaged in speaking, andadjusting based on the indication. 14. The non-transitory computer-readable media of claim 10, wherein the operations further comprise receiving an indication that the person at the location is engaged in speaking, the receiving performed by the acoustic echo processor. 15. The non-transitory computer-readable media of claim 10, wherein the ascertaining comprises determining whether the one or more images of the location show movement of lips of the person in a specified time period. 16. The non-transitory computer-readable media of claim 10, wherein the changing comprises halting or slowing the adaptation of the filter based at least in part on a determination that the first confidence score or the second confidence score exceeds a threshold. 17. The non-transitory computer-readable media of claim 10, the operations further comprising determining that a subsequent audio signal is not associated with a human voice and resuming the adaptation of the filter. 18. The non-transitory computer-readable media of claim 10, the operations further comprising performing acoustic echo processing on the audio signal to remove, at least in part, an acoustic echo. 19. A system comprising: one or more processors;a camera to capture one or more images of a location;a speaker to output audio in the location;a microphone to capture audio in the location; andone or more non-transitory computer-readable media having computer-executable instructions stored thereon and configured to program the one or more processors to perform operations comprising: ascertaining, from the one or more images of the location captured by the camera, that a person at the location is speaking;capturing an audio signal by the microphone in the location;detecting that the audio signal is associated with a voice;determining a confidence score that the audio signal represents the voice that is associated with the person at the location;adjusting the confidence score based at least in part on the one or more images captured by the camera depicting the person at the location engaged in speaking;determining that the confidence score exceeds a threshold;changing adaptation of a filter of an acoustic echo processor based at least in part on the confidence score exceeding the threshold and the audio signal; andremoving, at least in part, background noise from the audio signal based at least in part on the confidence score exceeding the threshold, wherein an amount of the background noise removed from the audio signal is based at least in part on a known echo path of the location. 20. The system of claim 19, wherein the detecting that the audio signal is associated with the voice is performed by a double-talk detector of the acoustic echo processor or by a voice activity detector of the acoustic echo processor. 21. The system of claim 20, wherein the detecting that the audio signal is associated with the voice is performed by the double-talk detector and the operations further comprise: receiving, by the double-talk detector, an indication that the one or more images of the location depict the person at the location engaged in speaking, andadjusting the confidence score based on the indication. 22. The system of claim 20, wherein the detecting that the audio signal is associated with the voice is performed by the voice activity detector and the operations further comprise: receiving, by the voice activity detector, an indication that the one or more images of the location depict the person at the location engaged in speaking, andadjusting the confidence score based on the indication. 23. The system of claim 19, wherein the operations further comprise: determining, from one or more subsequent images of the location captured by the camera, that the location does not include people;playing a calibration sound by the speaker;determining one or more echo paths of the location based on audio captured by the microphone at the location while the calibration sound is playing;calibrating the acoustic echo canceller filter based at least in part on the one or more echo paths; andlearning background noise characteristics in the audio captured by the microphone at the location based at least in part on no voice activity being detected. 24. The method of claim 1, wherein determining the confidence score indicating the likelihood that the audio signal is associated with the person at the location further comprises: accessing at least one stored speech characteristic associated with a voice profile corresponding to the person;determining a comparison between the at least one stored speech characteristic and a characteristic of the audio signal; anddetermining the confidence score based at least in part on the comparison.

이 특허에 인용된 특허 (7)

Chujo Kaoru,JPX ; Fujino Naoji,JPX, Echo canceller and method of controlling the same.
상세보기
Potts, Steven L.; Wang, Hong; Rabiner, Wendi Beth; Chu, Peter L., Locating an audio source.
상세보기
Vermeulen, Pieter J.; Savoie, Robert E.; Sutton, Stephen; Mozer, Forrest S., Method and apparatus of specifying and performing speech recognition operations.
상세보기
Stork David G. (Stanford CA) Wolff Gregory J. (Mountain View CA), Neural network acoustic and visual speech recognition system training method and apparatus.
상세보기
Mozer, Todd F.; Mozer, Forrest S.; Adams, Erich B., System and method for controlling the operation of a device by voice commands.
상세보기
Mozer,Todd F.; Mozer,Forrest S.; Adams,Erich B., System and method for controlling the operation of a device by voice commands.
상세보기
Bernd Girod DE, Video-assisted audio signal processing system and method.
상세보기

이 특허를 인용한 특허 (2)

Kuroki, Tomohiko, Audio processing apparatus and audio processing method.
상세보기
Chenier, Mario; Hardie, Tony Roy; Uppal, Nawdesh; Oliver, Brian; Mokady, Ran, Initiating device speech activity monitoring for communication sessions.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Acoustic echo cancellation using visual cues 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (7)

이 특허를 인용한 특허 (2)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Acoustic echo cancellation using visual cues 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (7)

이 특허를 인용한 특허 (2)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트