IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0056001
(2008-03-26)
|
등록번호 |
US-8781832
(2014-07-15)
|
발명자
/ 주소 |
- Comerford, Liam D.
- Frank, David Carl
- Lewis, Burn L.
- Rachevksy, Leonid
- Viswanathan, Mahesh
|
출원인 / 주소 |
- Nuance Communications, Inc.
|
대리인 / 주소 |
Wolf, Greenfield & Sacks, P.C.
|
인용정보 |
피인용 횟수 :
2 인용 특허 :
30 |
초록
▼
Techniques are disclosed for overcoming errors in speech recognition systems. For example, a technique for processing acoustic data in accordance with a speech recognition system comprises the following steps/operations. Acoustic data is obtained in association with the speech recognition system. Th
Techniques are disclosed for overcoming errors in speech recognition systems. For example, a technique for processing acoustic data in accordance with a speech recognition system comprises the following steps/operations. Acoustic data is obtained in association with the speech recognition system. The acoustic data is recorded using a combination of a first buffer area and a second buffer area, such that the recording of the acoustic data using the combination of the two buffer areas at least substantially minimizes one or more truncation errors associated with operation of the speech recognition system.
대표청구항
▼
1. A method for processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, the method comprising acts of: continuously recording acoustic data in a circular buffer;when an indication that the speech recognition system is being addressed
1. A method for processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, the method comprising acts of: continuously recording acoustic data in a circular buffer;when an indication that the speech recognition system is being addressed is detected, starting recording of acoustic data in a second buffer that is separate from the circular buffer;obtaining combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; andanalyzing the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the act of analyzing the combined acoustic data comprises an act of identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location. 2. The method of claim 1, wherein the act of obtaining combined acoustic data comprises an act of forming a composite buffer area comprising the first acoustic data prepended to the second acoustic data. 3. The method of claim 2, wherein: the composite buffer area contains, at a start of the first acoustic data prepended to the second acoustic data, oldest acoustic data in the circular buffer;acoustic data recorded in the circular buffer immediately before the indication that the speech recognition system is being addressed ends the first acoustic data; andin the composite buffer area, the acoustic data recorded in the circular buffer immediately before the indication that the speech recognition system is being addressed is contiguous in memory with acoustic data which is recorded in the second buffer immediately following the indication that the speech recognition system is being addressed. 4. The method of claim 2, wherein the act of analyzing the combined acoustic data comprises processing acoustic data in the composite buffer area to detect one or more features indicating silence. 5. The method of claim 4, wherein a location in the region of silence closest to the reference location is used as a location in the composite buffer area at which speech intended for the speech recognition system to process begins. 6. The method of claim 2, further comprising an act of decoding acoustic data in the composite buffer area into text. 7. The method of claim 2, wherein the act of forming the composite buffer area comprises: copying the first acoustic data recorded in the circular buffer to the composite buffer area. 8. The method of claim 1, wherein the region of silence closest to the reference location is in the first acoustic data if the indication that the speech recognition system is being addressed was given after speech started. 9. The method of claim 1, wherein the recording of acoustic data in the second buffer continues until an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer. 10. The method of claim 1, further comprising: stopping recording of acoustic data in the circular buffer when recording of acoustic data is started in the second buffer;stopping recording of acoustic data in the second buffer and restarting recording of acoustic data in the circular buffer, when an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer. 11. The method of claim 10, wherein the indication that the speech recognition system is being addressed comprises a microphone on event, and the indication that the speech recognition system is no longer being addressed comprises a microphone off event. 12. The method of claim 1, wherein the second buffer comprises a linear buffer. 13. The method of claim 1, wherein the circular buffer and the second buffer are at least part of a single storage data structure. 14. The method of claim 1, wherein the circular buffer and the second buffer are at least part of separate storage data structures. 15. Apparatus for processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, comprising: at least one memory comprising a circular buffer and a second buffer that is separate from the circular buffer; andat least one processor coupled to the memory and operative to: continuously record acoustic data in the circular buffer;when an indication that the speech recognition system is being addressed is detected, start recording of acoustic data in a second buffer;obtain combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; andanalyze the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the at least one processor is further operative to analyze the combined acoustic data at least in part by identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location. 16. The apparatus of claim 15, wherein prepending the first acoustic data comprises copying the acoustic data recorded in the circular buffer to a composite buffer area such that the composite buffer area comprises the first acoustic data prepended to the second acoustic data. 17. The apparatus of claim 15, wherein the region of silence closest to the reference location is in the first acoustic data if the indication that the speech recognition system is being addressed was given after speech started. 18. The apparatus of claim 15, wherein the at least one processor is further operative to: stop recording of acoustic data in the circular buffer when recording of acoustic data is started in the second buffer; andstop recording of acoustic data in the second buffer and restart recording of acoustic data in the circular buffer, when an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer. 19. At least one article of manufacture for use in processing acoustic data to reduce one or more truncation errors associated with operation of a speech recognition system, comprising at least one machine readable medium having encoded thereon one or more programs which when executed implement acts of: continuously recording acoustic data in a circular buffer;when an indication that the speech recognition system is being addressed is detected, starting recording of acoustic data in a second buffer that is separate from the circular buffer;obtaining combined acoustic data at least in part by prepending first acoustic data recorded in the circular buffer to a beginning of second acoustic data recorded in the second buffer; andanalyzing the combined acoustic data, which comprises data from the circular buffer and data from the second buffer, to identify a likely speech endpoint in the combined acoustic data, wherein the act of analyzing comprises using a boundary between the first and second acoustic data as a reference location wherein the act of analyzing the combined acoustic data comprises an act of identifying, among one or more regions in the combined acoustic data likely to correspond to silence, a region of silence closest to the reference location. 20. The at least one article of manufacture of claim 19, wherein prepending the first acoustic data comprises copying the acoustic data recorded in the circular buffer to a composite buffer area such that the composite buffer area comprises the first acoustic data prepended to the second acoustic data. 21. The at least one article of manufacture of claim 19, wherein the one or more programs further implement: stopping recording of acoustic data in the circular buffer when recording of acoustic data is started in the second buffer; andstopping recording of acoustic data in the second buffer and restarting recording of acoustic data in the circular buffer, when an indication that the speech recognition system is no longer being addressed is detected and a feature indicating silence is detected in the acoustic data recorded in the second buffer. 22. A method for processing acoustic data in accordance with a speech recognition system, the method comprising acts of: recording acoustic data in at least one recording medium;detecting, at a first time, a user-generated input event instructing the speech recognition system to start speech recognition processing, the first time corresponding to a first location of the recorded acoustic data recorded in the at least one recording medium;searching in the recorded acoustic data to identify a silence region having the shortest distance, among all silence regions in the recorded acoustic data, relative to the first location in the recorded acoustic data corresponding to the first time at which the user-generated input event was detected; andidentifying a location in the identified silence region as a start location for speech recognition processing of at least a portion of the recorded acoustic data, wherein: if the recorded acoustic data is such that the identified silence region entirely follows the first location, the start location for speech recognition processing follows the first location; andif the recorded acoustic data is such that the identified silence region entirely precedes the first location, the start location for speech recognition processing precedes the first location. 23. The method of claim 22, further comprising: detecting, at a second time later than the first time, an indication to stop speech recognition processing, the second time corresponding to a second location of the recorded acoustic data;continuing to record acoustic data after the second time; andperforming speech recognition processing on at least a portion of the recorded acoustic data recorded after the second time. 24. The method of claim 23, further comprising: searching for acoustic data representing silence in the acoustic data recorded after the second time;identifying a third location having acoustic data representing silence; andperforming speech recognition processing on the recorded acoustic data between the second and third locations. 25. A system for processing acoustic data in accordance with a speech recognition system, the system comprising: at least one memory for storing executable instructions;at least one processor programmed by the executable instructions to; record acoustic data in at least one recording medium;detect, at a first time, a user-generated input event instructing the speech recognition system to start speech recognition processing, the first time corresponding to a first location of the recorded acoustic data recorded in the at least one recording medium;search in the recorded acoustic data to identify a silence region having the shortest distance, among all silence regions in the recorded acoustic data, relative to the first location in the recorded acoustic data corresponding to the first time at which the user-generated input event was detected; andidentify a location in the identified silence region as a start location for speech recognition processing of at least a portion of the recorded acoustic data, wherein:if the recorded acoustic data is such that the identified silence region entirely follows the first location, the start location for speech recognition processing follows the first location; andif the recorded acoustic data is such that the identified silence region entirely precedes the first location, the start location for speech recognition processing precedes the first location. 26. The system of claim 25, wherein the at least one processor is further programmed to: detect, at a second time later than the first time, an indication to stop speech recognition processing, the second time corresponding to a second location of the recorded acoustic data;continue to record acoustic data after the second time; andperform speech recognition processing on at least a portion of the recorded acoustic data recorded after the second time. 27. The system of claim 26, wherein the at least one processor is further programmed to: search for acoustic data representing silence in the acoustic data recorded after the second time;identify a third location having acoustic data representing silence; andperform speech recognition processing on the recorded acoustic data between the second and third locations. 28. At least one computer readable memory encoded with instructions that, when executed, perform a method for processing acoustic data in accordance with a speech recognition system, the method comprising acts of: recording acoustic data in at least one recording medium;detecting, at a first time, a user-generated input event instructing the speech recognition system to start speech recognition processing, the first time corresponding to a first location of the recorded acoustic data recorded in the at least one recording medium;searching in the recorded acoustic data to identify a silence region having the shortest distance, among all silence regions in the recorded acoustic data, relative to the first location in the recorded acoustic data corresponding to the first time at which the user-generated input event was detected; andidentifying a location in the identified silence region as a start location for speech recognition processing of at least a portion of the recorded acoustic data, wherein: if the recorded acoustic data is such that the identified silence region entirely follows the first location, the start location for speech recognition processing follows the first location; andif the recorded acoustic data is such that the identified silence region entirely precedes the first location, the start location for speech recognition processing precedes the first location. 29. The at least one computer readable memory of claim 28, wherein the method further comprises: detecting, at a second time later than the first time, an indication to stop speech recognition processing, the second time corresponding to a second location of the recorded acoustic data;continuing to record acoustic data after the second time; andperforming speech recognition processing on at least a portion of the recorded acoustic data recorded after the second time. 30. The at least one computer readable memory of claim 29, wherein the method further comprises: searching for acoustic data representing silence in the acoustic data recorded after the second time;identifying a third location having acoustic data representing silence; andperforming speech recognition processing on the recorded acoustic data between the second and third locations.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.