Systems, methods, and apparatus for speech feature detection
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-021/00
G10L-025/78
출원번호
US-0092502
(2011-04-22)
등록번호
US-9165567
(2015-10-20)
발명자
/ 주소
Visser, Erik
Liu, Ian Ernan
Shin, Jongwon
출원인 / 주소
QUALCOMM Incorporated
대리인 / 주소
Barker, Scott A.
인용정보
피인용 횟수 :
9인용 특허 :
14
초록▼
Implementations and applications are disclosed for detection of a transition in a voice activity state of an audio signal, based on a change in energy that is consistent in time across a range of frequencies of the signal. For example, such detection may be based on a time derivative of energy for e
Implementations and applications are disclosed for detection of a transition in a voice activity state of an audio signal, based on a change in energy that is consistent in time across a range of frequencies of the signal. For example, such detection may be based on a time derivative of energy for each of a number of different frequency components of the signal.
대표청구항▼
1. A method of processing an audio signal, said method comprising: for each of a first plurality of consecutive segments of the audio signal, determining that voice activity is present in the segment;for each of a second plurality of consecutive segments of the audio signal that occurs immediately a
1. A method of processing an audio signal, said method comprising: for each of a first plurality of consecutive segments of the audio signal, determining that voice activity is present in the segment;for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, determining that voice activity is not present in the segment;using at least one array of logic elements, detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; andproducing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. 2. The method according to claim 1, wherein said method comprises calculating a time derivative of energy for each of a plurality of different frequency components of the audio signal during said one among the second plurality of segments, and wherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy. 3. The method according to claim 2, wherein said detecting that the transition occurs includes, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, producing a corresponding indication of whether the frequency component is active, and wherein said detecting that the transition occurs is based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value. 4. The method according to claim 3, wherein said method comprises, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal: calculating a time derivative of energy for each of a plurality of different frequency components of the audio signal during the segment;for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, producing a corresponding indication of whether the frequency component is active; anddetermining that a transition in a voice activity state of the audio signal does not occur during the segment, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value. 5. The method according to claim 3, wherein said method comprises, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal: calculating, for each of a plurality of different frequency components of the audio signal during the segment, a second derivative of energy with respect to time;for each of the plurality of different frequency components, and based on the corresponding calculated second derivative of energy with respect to time, producing a corresponding indication of whether the frequency component is impulsive; anddetermining that a transition in a voice activity state of the audio signal does not occur during the segment, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value. 6. The method according to claim 3, wherein said method comprises, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal: calculating, for each of a plurality of different frequency components of the audio signal during the segment, a second-order derivative of energy with respect to time;for each of the plurality of different frequency components, and based on the corresponding calculated second-order derivative of energy with respect to time, producing a corresponding indication of whether the frequency component is impulsive; anddetermining that a transition in a voice activity state of the audio signal does not occur during the segment, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value. 7. The method according to claim 1, wherein, for each of the first plurality of consecutive segments of the audio signal, said determining that voice activity is present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, and wherein, for each of the second plurality of consecutive segments of the audio signal, said determining that voice activity is not present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment. 8. The method according to claim 7, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment. 9. The method according to claim 7, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment. 10. The method according to claim 7, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment comprises calculating, for each of a first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences, and wherein, for each segment of said second plurality, said determining that voice activity is not present in the segment comprises calculating, for each of the first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences. 11. The method according to claim 10, wherein said method comprises calculating a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, and wherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components. 12. The method according to claim 10, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment is based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, and wherein, for each segment of said second plurality, said determining that voice activity is not present in the segment is based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences. 13. The method according to claim 1, wherein said method comprises: calculating a time derivative of energy for each of a plurality of different frequency components of the audio signal during a segment of one of the first and second pluralities of segments; andproducing a voice activity detection indication for said segment of one of the first and second pluralities,wherein said producing the voice activity detection indication includes comparing a value of a test statistic for the segment to a value of a threshold, andwherein said producing the voice activity detection indication includes modifying a relation between the test statistic and the threshold, based on said calculated plurality of time derivatives of energy, andwherein a value of said voice activity detection signal for said segment of one of the first and second pluralities is based on said voice activity detection indication. 14. The method according to claim 1, wherein said method is performed by a communications device. 15. An apparatus for processing an audio signal, said apparatus comprising: means for determining, for each of a first plurality of consecutive segments of the audio signal, that voice activity is present in the segment;means for determining, for each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, that voice activity is not present in the segment;means for detecting that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments; andmeans for producing a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity, andwherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. 16. The apparatus according to claim 15, wherein said apparatus comprises means for calculating a time derivative of energy for each of a plurality of different frequency components of the audio signal during said one among the second plurality of segments, and wherein said means for detecting that the transition occurs during said one among the second plurality of segments is configured to detect the transition based on the calculated time derivatives of energy. 17. The apparatus according to claim 16, wherein said means for detecting that the transition occurs includes means for producing, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active, and wherein said means for detecting that the transition occurs is configured to detect the transition based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value. 18. The apparatus according to claim 17, wherein said apparatus comprises: means for calculating, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal, a time derivative of energy for each of a plurality of different frequency components of the audio signal during the segment;means for producing, for each of said plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the audio signal, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active; andmeans for determining that a transition in a voice activity state of the audio signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the audio signal, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value. 19. The apparatus according to claim 17, wherein said apparatus comprises: means for calculating, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal, a second derivative of energy with respect to time for each of a plurality of different frequency components of the audio signal during the segment;means for producing, for each of the plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the audio signal, and based on the corresponding calculated second derivative of energy with respect to time, a corresponding indication of whether the frequency component is impulsive; andmeans for determining that a transition in a voice activity state of the audio signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the audio signal, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value. 20. The apparatus according to claim 15, wherein, for each of the first plurality of consecutive segments of the audio signal, said means for determining that voice activity is present in the segment is configured to perform said determining based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, and wherein, for each of the second plurality of consecutive segments of the audio signal, said means for determining that voice activity is not present in the segment is configured to perform said determining based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment. 21. The apparatus according to claim 20, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment. 22. The apparatus according to claim 20, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment. 23. The apparatus according to claim 20, wherein said means for determining that voice activity is present in the segment comprises means for calculating, for each segment of said first plurality and for each segment of said second plurality, and for each of a first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences. 24. The apparatus according to claim 23, wherein said apparatus comprises means for calculating a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, and wherein said means for detecting that the transition occurs during said one among the second plurality of segments is configured to detect that the transition occurs based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components. 25. The apparatus according to claim 23, wherein said means for determining, for each segment of said first plurality, that voice activity is present in the segment is configured to determine that said voice activity is present based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, and wherein said means for determining, for each segment of said second plurality, that voice activity is not present in the segment is configured to determine that voice activity is not present based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences. 26. The apparatus according to claim 15, wherein said apparatus comprises: means for calculating a time derivative of energy for each of a plurality of different frequency components of the audio signal during a segment of one of the first and second pluralities of segments; andmeans for producing a voice activity detection indication for said segment of one of the first and second pluralities,wherein said means for producing the voice activity detection indication includes means for comparing a value of a test statistic for the segment to a threshold value, andwherein said means for producing the voice activity detection indication includes means for modifying a relation between the test statistic and the threshold, based on said calculated plurality of time derivatives of energy, andwherein a value of said voice activity detection signal for said segment of one of the first and second pluralities is based on said voice activity detection indication. 27. An apparatus for processing an audio signal, said apparatus comprising: a first voice activity detector configured to determine:for each of a first plurality of consecutive segments of the audio signal, that voice activity is present in the segment, andfor each of a second plurality of consecutive segments of the audio signal that occurs immediately after the first plurality of consecutive segments in the audio signal, that voice activity is not present in the segment;a second voice activity detector configured to detect that a transition in a voice activity state of the audio signal occurs during one among the second plurality of consecutive segments; anda signal generator configured to produce a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the audio signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. 28. The apparatus according to claim 27, wherein said apparatus comprises a calculator configured to calculate a time derivative of energy for each of a plurality of different frequency components of the audio signal during said one among the second plurality of segments, and wherein said second voice activity detector is configured to detect said transition based on the calculated time derivatives of energy. 29. The apparatus according to claim 28, wherein said second voice activity detector includes a comparator configured to produce, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active, and wherein said second voice activity detector is configured to detect the transition based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value. 30. The apparatus according to claim 29, wherein said apparatus comprises: a calculator configured to calculate, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal, a time derivative of energy for each of a plurality of different frequency components of the audio signal during the segment; anda comparator configured to produce, for each of said plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the audio signal, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active,wherein said second voice activity detector is configured to determine that a transition in a voice activity state of the audio signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the audio signal, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value. 31. The apparatus according to claim 29, wherein said apparatus comprises: a calculator configured to calculate, for a segment that occurs prior to the first plurality of consecutive segments in the audio signal, a second derivative of energy with respect to time for each of a plurality of different frequency components of the audio signal during the segment; anda comparator configured to produce, for each of the plurality of different frequency components of said segment that occurs prior to the first plurality of consecutive segments in the audio signal, and based on the corresponding calculated second derivative of energy with respect to time, a corresponding indication of whether the frequency component is impulsive,wherein said second voice activity detector is configured to determine that a transition in a voice activity state of the audio signal does not occur during said segment that occurs prior to the first plurality of consecutive segments in the audio signal, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value. 32. The apparatus according to claim 27, wherein said first voice activity detector is configured to determine, for each of the first plurality of consecutive segments of the audio signal, that voice activity is present in the segment, based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, and wherein said first voice activity detector is configured to determine, for each of the second plurality of consecutive segments of the audio signal, that voice activity is not present in the segment, based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment. 33. The apparatus according to claim 32, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment. 34. The apparatus according to claim 32, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment. 35. The apparatus according to claim 32, wherein said first voice activity detector includes a calculator configured to calculate, for each segment of said first plurality and for each segment of said second plurality, and for each of a first plurality of different frequency components of the audio signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences. 36. The apparatus according to claim 35, wherein said apparatus comprises a calculator configured to calculate a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, and wherein said second voice activity detector is configured to detect that the transition occurs based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components. 37. The apparatus according to claim 35, wherein said first voice activity detector is configured to determine, for each segment of said first plurality, that said voice activity is present in the segment based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, and wherein said first voice activity detector is configured to determine, for each segment of said second plurality, that voice activity is not present in the segment based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences. 38. The apparatus according to claim 27, wherein said apparatus comprises: a third voice activity detector configured to calculate a time derivative of energy for each of a plurality of different frequency components of the audio signal during a segment of one of the first and second pluralities of segments; anda fourth voice activity detector configured to produce a voice activity detection indication for said segment of one of the first and second pluralities, based on a result of comparing a value of a test statistic for the segment to a threshold value,wherein said fourth voice activity detector is configured to modify a relation between the test statistic and the threshold, based on said calculated plurality of time derivatives of energy, andwherein a value of said voice activity detection signal for said segment of one of the first and second pluralities is based on said voice activity detection indication. 39. The apparatus according to claim 38, wherein the fourth voice activity detector is the first voice activity detector, and wherein said determining that voice activity is present or not present in the segment includes producing said voice activity detection indication. 40. A non-transitory computer-readable medium that stores machine-executable instructions that when executed by one or more processors cause the one or more processors to: determine, for each of a first plurality of consecutive segments of a multichannel signal, and based on a difference between a first channel of the multichannel signal during the segment and a second channel of the multichannel signal during the segment, that voice activity is present in the segment;determine, for each of a second plurality of consecutive segments of the multichannel signal that occurs immediately after the first plurality of consecutive segments in the multichannel signal, and based on a difference between a first channel of the multichannel signal during the segment and a second channel of the multichannel signal during the segment, that voice activity is not present in the segment;detect that a transition in a voice activity state of the multichannel signal occurs during one among the second plurality of consecutive segments that is not the first segment to occur among the second plurality; andproduce a voice activity detection signal that has, for each segment in the first plurality and for each segment in the second plurality, a corresponding value that indicates one among activity and lack of activity,wherein, for each of the first plurality of consecutive segments, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs before the segment in which the detected transition occurs, and based on said determining, for at least one segment of the first plurality, that voice activity is present in the segment, the corresponding value of the voice activity detection signal indicates activity, andwherein, for each of the second plurality of consecutive segments that occurs after the segment in which the detected transition occurs, and in response to said detecting that a transition in the speech activity state of the multichannel signal occurs, the corresponding value of the voice activity detection signal indicates a lack of activity. 41. The medium according to claim 40, wherein said instructions when executed by the one or more processors cause the one or more processors to calculate a time derivative of energy for each of a plurality of different frequency components of the first channel during said one among the second plurality of segments, and wherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy. 42. The medium according to claim 41, wherein said detecting that the transition occurs includes, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, producing a corresponding indication of whether the frequency component is active, and wherein said detecting that the transition occurs is based on a relation between the number of said indications that indicate that the corresponding frequency component is active and a first threshold value. 43. The medium according to claim 42, wherein said instructions when executed by one or more processors cause the one or more processors, for a segment that occurs prior to the first plurality of consecutive segments in the multichannel signal: to calculate a time derivative of energy for each of a plurality of different frequency components of the first channel during the segment;to produce, for each of the plurality of different frequency components, and based on the corresponding calculated time derivative of energy, a corresponding indication of whether the frequency component is active; andto determine that a transition in a voice activity state of the multichannel signal does not occur during the segment, based on a relation between (A) the number of said indications that indicate that the corresponding frequency component is active and (B) a second threshold value that is higher than said first threshold value. 44. The medium according to claim 42, wherein said instructions when executed by one or more processors cause the one or more processors, for a segment that occurs prior to the first plurality of consecutive segments in the multichannel signal: to calculate, for each of a plurality of different frequency components of the first channel during the segment, a second derivative of energy with respect to time;to produce, for each of the plurality of different frequency components, and based on the corresponding calculated second derivative of energy with respect to time, a corresponding indication of whether the frequency component is impulsive; andto determine that a transition in a voice activity state of the multichannel signal does not occur during the segment, based on a relation between the number of said indications that indicate that the corresponding frequency component is impulsive and a threshold value. 45. The medium according to claim 40, wherein, for each of the first plurality of consecutive segments of the audio signal, said determining that voice activity is present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment, and wherein, for each of the second plurality of consecutive segments of the audio signal, said determining that voice activity is not present in the segment is based on a difference between a first channel of the audio signal during the segment and a second channel of the audio signal during the segment. 46. The medium according to claim 45, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference between a level of the first channel and a level of the second channel during the segment. 47. The medium according to claim 45, wherein, for each segment of said first plurality and for each segment of said second plurality, said difference is a difference in time between an instance of a signal in the first channel during the segment and an instance of said signal in the second channel during the segment. 48. The medium according to claim 45, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment comprises calculating, for each of a first plurality of different frequency components of the multichannel signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences, and wherein, for each segment of said second plurality, said determining that voice activity is not present in the segment comprises calculating, for each of the first plurality of different frequency components of the multichannel signal during the segment, a difference between a phase of the frequency component in the first channel and a phase of the frequency component in the second channel, wherein said difference between the first channel during the segment and the second channel during the segment is one of said calculated phase differences. 49. The medium according to claim 48, wherein said instructions when executed by one or more processors cause the one or more processors to calculate a time derivative of energy for each of a second plurality of different frequency components of the first channel during said one among the second plurality of segments, and wherein said detecting that the transition occurs during said one among the second plurality of segments is based on the calculated time derivatives of energy, andwherein a frequency band that includes the first plurality of frequency components is separate from a frequency band that includes the second plurality of frequency components. 50. The medium according to claim 48, wherein, for each segment of said first plurality, said determining that voice activity is present in the segment is based on a corresponding value of a coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences, and wherein, for each segment of said second plurality, said determining that voice activity is not present in the segment is based on a corresponding value of the coherency measure that indicates a degree of coherence among the directions of arrival of at least the plurality of different frequency components, wherein said value is based on information from the corresponding plurality of calculated phase differences.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (14)
Ramabadran,Tenkasi, Distributed speech recognition with back-end voice activity detection apparatus and method.
Gupta Prabhat K. (Germantown MD) Jangi Shrirang (Germantown MD) Lamkin Allan B. (Arlington VA) Kepley ; III W. Robert (Gaithersburg MD) Morris Adrian J. (Gaithersburg MD), Voice activity detector for speech signals in variable background noise.
Disch, Sascha; Geiger, Ralf; Helmrich, Christian; Multrus, Markus; Schmidt, Konstantin, Apparatus and method for generating a frequency enhanced signal using shaping of the enhancement signal.
Disch, Sascha; Geiger, Ralf; Helmrich, Christian; Multrus, Markus; Schmidt, Konstantin, Apparatus and method for generating a frequency enhanced signal using temporal smoothing of subbands.
Disch, Sascha; Geiger, Ralf; Helmrich, Christian; Multrus, Markus; Schmidt, Konstantin, Apparatus and method for generating a frequency enhancement signal using an energy limitation operation.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.