[특허]Time difference of arrival determination with direct sound

Time difference of arrival determination with direct sound 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G01S-005/00 G01S-005/22 G01S-003/86
출원번호	US-0168759 (2011-06-24)
등록번호	US-9194938 (2015-11-24)
발명자 / 주소	Velusamy, Kavitha
출원인 / 주소	Amazon Technologies, Inc.
대리인 / 주소	Lee & Hayes, PLLC
인용정보	피인용 횟수 : 0 인용 특허 : 8

초록 ▼

Acoustic signals may be localized such that their position in space is determined. Time-difference-of-arrival data from multiple microphones may be used for this localization. Signal data from the microphones may be degraded by reverberation and other environmental distortions, resulting in erroneous localization. By detecting a portion of the signal resulting from sound directly reaching a microphone rather than from a reverberation, accuracy of the localization is improved.

대표청구항 ▼

1. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising: acquiring, from a plurality of microphones, a plurality of audio signals associated with an acoustic source;filtering the plurality of audio signals with a band-pass filter;estimating a noise floor of the plurality of audio signals;for individual audio signals of the plurality of audio signals: identifying an event of interest;determining that the event of interest rises above the noise floor;adjusting the noise floor at a rate that is based at least partly on an extent to which the event of interest rises above the noise floor;detecting a peak after the event of interest rose above the noise floor; anddetermining a time at which the peak occurs; anddetermining time-difference-of-arrival (TDOA) values for the event of interest based at least in part on a difference between the time at which the peaks occur in the individual audio signals. 2. The one or more non-transitory computer-readable storage media of claim 1, wherein the band-pass filter is configured to have a bandwidth extending from about 800 Hertz to about 2 Kilohertz. 3. The one or more non-transitory computer-readable storage media of claim 1, wherein the event of interest comprises an acoustic signal having a duration of less than about 250 milliseconds. 4. The one or more non-transitory computer-readable storage media of claim 3, wherein the acoustic signal is generated by a human gesture. 5. The one or more non-transitory computer-readable storage media of claim 1, the acts further comprising removing noise from the plurality of audio signals. 6. The one or more non-transitory computer-readable storage media of claim 1, the acts further comprising filtering the TDOA values based at least in part on one or more physical attributes of the plurality of microphones or a room in which the plurality of microphones reside. 7. The one or more non-transitory computer-readable storage media of claim 6, wherein the physical attributes comprise known distances between the plurality of microphones. 8. The one or more non-transitory computer-readable storage media of claim 6, wherein the physical attributes comprise known dimensions of the room. 9. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising: acquiring a plurality of audio signals associated with an acoustic source;for individual audio signals of the plurality of audio signals: identifying an event of interest by identifying a portion of the audio signal that corresponds to direct sound from the acoustic source and disregarding portions of the audio signal that correspond to indirect sound from the acoustic source;determining that the event of interest rises above a noise floor;detecting a peak after the event of interest rose above the noise floor; anddetermining a time at which the peak occurs; anddetermining time-difference-of-arrival (TDOA) values for the event of interest based at least in part on a difference between the time at which the peaks occur in the plurality of filtered audio signals. 10. The one or more non-transitory computer-readable storage media of claim 9, the acts further comprising filtering the plurality of audio signals with a band-pass filter. 11. The one or more non-transitory computer-readable storage media of claim 10, wherein the band-pass filter is configured to have a bandwidth extending from about 800 Hertz to about 2 Kilohertz. 12. The one or more non-transitory computer-readable storage media of claim 9, the acts further comprising estimating the noise floor of the plurality of audio signals. 13. The one or more non-transitory computer-readable storage media of claim 9, the acts further comprising removing noise from the plurality of audio signals. 14. The one or more non-transitory computer-readable storage media of claim 9, wherein the event of interest further comprises an acoustic signal generated by the acoustic source having a duration of less than about 250 milliseconds. 15. The one or more non-transitory computer-readable storage media of claim 14, wherein the acoustic signal is generated by a user physically striking an object within an environment. 16. The one or more non-transitory computer-readable storage media of claim 9, the acts further comprising filtering the TDOA values based at least in part on at least one of one or more physical attributes of a plurality of microphones receiving the plurality of audio signals or physical attributes of a room in which the plurality of microphones reside. 17. The one or more non-transitory computer-readable storage media of claim 16, wherein the physical attributes comprise known distances between the plurality of microphones. 18. The one or more non-transitory computer-readable storage media of claim 16, wherein the physical attributes comprise known dimensions of the room. 19. A system comprising: a plurality of sensors;a time-difference-of-arrival module coupled to the sensors and configured to: acquire, via the sensors, a plurality of signals associated with an acoustic source;for individual audio signals of the plurality of signals: identify an event of interest;determine that the event of interest rises above a noise floor;based at least partly on the event of interest rising above the noise floor, increase the noise floor at a first rate;determine that the event of interest falls below the noise floor; andbased at least partly on the event of interest falling below the noise floor, decrease the noise floor at a second rate different from the first rate. 20. The system of claim 19, wherein the plurality of sensors comprise one or more microphones. 21. The system of claim 19, the time-difference-of-arrival module being further configured to: filter the plurality of signals with a band-pass filter; andestimate the noise floor of the plurality of signals. 22. The system of claim 19, wherein the time-difference-of-arrival module is further configured to: detect a peak after the event of interest rose above the noise floor;determine a time at which the peak occurs;determine time-difference-of-arrival (TDOA) values based at least in part on a difference between the time at which the peaks of the signals occur; andlocalize the source based at least in part upon the determined TDOA values. 23. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising: acquiring, from a plurality of microphones, a plurality of audio signals associated with an acoustic source generated by a human speaker;estimating a noise floor of the plurality of audio signals; andfor individual audio signals of the plurality of audio signals: identifying an event of interest that comprises human speech;determining a time that the event of interest rises above the noise floor;designating a window of samples starting within a threshold amount of time from the time that the event of interest rises above the noise floor; andadjusting the noise floor at a rate that is based at least partly on an extent to which the event of interest rises above the noise floor. 24. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising determining that a quiet period is present before the event of interest, wherein the quiet period is less than about 100 milliseconds in duration. 25. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising determining that a quiet period is present before the event of interest, wherein the quiet period comprises a period during which any audio signals are at or below the noise floor. 26. The one or more non-transitory computer-readable storage media of claim 23, wherein the window of samples is less than about 2 milliseconds in duration. 27. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising filtering the plurality of audio signals with a band-pass filter configured to have a bandwidth extending from about 2 Kilohertz to about 8 Kilohertz. 28. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising removing noise from the plurality of audio signals. 29. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising: calculating time-difference-of-arrival (TDOA) values based at least in part on samples within the window of samples;sliding the window of samples forward by a time, t;determining that the samples are within a region of interest that comprises at least a portion of a direct signal from the event of interest; andagain calculating TDOA values based at least in part upon the samples within the window after sliding the window forward by the time, t. 30. The one or more non-transitory computer-readable storage media of claim 29, the acts further comprising filtering the TDOA values based at least in part on at least one of one or more physical attributes of the plurality of microphones or of a room in which the plurality of microphones reside. 31. The one or more non-transitory computer-readable storage media of claim 30, wherein the physical attributes comprise known distances between the plurality of microphones. 32. The one or more non-transitory computer-readable storage media of claim 30, wherein the physical attributes comprise known dimensions of the room. 33. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising: based at least partly on a determination that a stable set of TDOA from a pre-determined number of consecutive windows values is reached, localizing the acoustic source with use of the stable TDOA values. 34. The one or more non-transitory computer-readable storage media of claim 23, the acts further comprising continuously tracking the acoustic source. 35. The one or more non-transitory computer-readable storage media of claim 23, wherein identifying the event of interest that comprises human speech includes identifying a portion of the audio signal that corresponds to direct sound traveling directly from the acoustic source and disregarding portions of the audio signal that correspond to indirect sound traveling indirectly from the acoustic source. 36. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed, cause one or more processors to perform acts comprising: acquiring, from a plurality of microphones, a plurality of audio signals associated with an acoustic source, wherein a portion of an audio signal of the plurality of audio signals is indicative of direct sound traveling directly from the acoustic source and portions of the audio signal are indicative of indirect sound traveling indirectly from the acoustic source;calculating a time-difference-of-arrival value for the audio signal at least partly by disregarding the portions of the audio signal that are indicative of the indirect sound; anddetermining a location of the acoustic source based at least in part upon the time-difference-of arrival value; andfor an audio signal of the plurality of audio signals: estimating a noise floor of the audio signal;identifying an event of interest that comprises human speech; andadjusting the noise floor at a rate that is based at least partly on an extent to which the event of interest rises above the noise floor. 37. The one or more non-transitory computer-readable storage media of claim 36, wherein the portion of the audio signal that is indicative of the direct sound is received from the acoustic source free from reflection. 38. The one or more non-transitory computer-readable storage media of claim 36, wherein the calculating comprises applying a phase transform. 39. The one or more non-transitory computer-readable storage media of claim 36, wherein the calculating comprises applying a generalized cross correlation technique. 40. The one or more non-transitory computer-readable storage media of claim 36, wherein the determining the location further comprises filtering the time-difference-of-arrival value based at least in part upon one or more physical attributes of at least one of the plurality of the microphones or of a room in which the plurality of microphones reside. 41. The one or more non-transitory computer-readable storage media of claim 40, wherein the physical attributes comprise known distances between the plurality of microphones. 42. The one or more non-transitory computer-readable storage media of claim 40, wherein the physical attributes comprise known dimensions of the room. 43. A system comprising: a plurality of sensors;a time-difference-of-arrival module coupled to the sensors and configured to: acquire, from a plurality of sensors, a plurality of audio signals associated with an acoustic source generated by a human speaker;for individual ones of the plurality of the audio signals: identify an event of interest that comprises human speech;determine a time that the event of interest rises above a noise floor; andadjusting the noise floor at a rate that is based at least partly on an extent to which the event of interest rises above the noise floor. 44. The system of claim 43, wherein the sensors comprise microphones. 45. The system of claim 43, wherein the event of interest comprises a portion of an audio signal that is received directly by a microphone free from reflection. 46. The system of claim 43, the time-difference-of-arrival module further configured to: filter the plurality of audio signals with a band-pass filter; andestimate the noise floor of the plurality of audio signals. 47. The system of claim 43, further comprising a wherein the time-difference-of-arrival module is further configured to: determine that a quiet period is present before the event of interest;designate a window of samples starting within a threshold amount of time from the time that the event of interest rises above the quiet period;calculate time-difference-of-arrival (TDOA) values based at least in part on samples within the window of samples;slide the window of samples forward by a time, t;determine that the samples are within a region of interest that comprises at least a portion of a direct signal from the event of interest;again calculate TDOA values based at least in part upon the samples within the window of samples after sliding the window forward by the time, t; andlocalize the acoustic source based at least in part upon the calculated TDOA values.

이 특허에 인용된 특허 (8)

Karl J. Kuhn ; John Mark Zetts, Apparatus and method of in-service audio/video synchronization testing.
상세보기
Showen Robert L. ; Dunham Jason W., Automatic real-time gunshot locator and display system.
상세보기
Thyagarajan Balasubramanian ; Karen M. Braun, Gamut mapping using local area information.
상세보기
Koch,Roland; Weidner,J체rgen, Measuring device, and method for locating a partial discharge.
상세보기
Vermeulen, Pieter J.; Savoie, Robert E.; Sutton, Stephen; Mozer, Forrest S., Method and apparatus of specifying and performing speech recognition operations.
상세보기
Rao,Velliyur Nott Mallikarjuna; Sievert,Allen C., Process for the preparation of 1,1,1,3,3-pentafluoropropane and 1,1,1,3,3,3-hexafluoropropane.
상세보기
Mozer, Todd F.; Mozer, Forrest S.; Adams, Erich B., System and method for controlling the operation of a device by voice commands.
상세보기
Mozer,Todd F.; Mozer,Forrest S.; Adams,Erich B., System and method for controlling the operation of a device by voice commands.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Time difference of arrival determination with direct sound 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (8)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Time difference of arrival determination with direct sound 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (8)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트