Graphical user interface for determining speech recognition accuracy
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G10L-021/06
G10L-021/00
G10L-015/26
G10L-015/00
출원번호
US-0196017
(2002-07-16)
등록번호
US-7260534
(2007-08-21)
발명자
/ 주소
Gandhi,Shailesh B.
Jaiswal,Peeyush
Moore,Victor S.
Toon,Gregory L.
출원인 / 주소
International Business Machines Corporation
대리인 / 주소
Akerman Senterfitt
인용정보
피인용 횟수 :
55인용 특허 :
9
초록▼
A solution for determining the accuracy of a speech recognition system. A first graphical user interface (GUI) is provided for selecting a transaction log. The transaction log has at least one entry that specifies a speech recognition text result. A second GUI is also provided for selecting at least
A solution for determining the accuracy of a speech recognition system. A first graphical user interface (GUI) is provided for selecting a transaction log. The transaction log has at least one entry that specifies a speech recognition text result. A second GUI is also provided for selecting at least one audio segment corresponding to the entry. The second GUI includes an activatable icon for initiating transcription of the audio segment through a reference speech recognition engine to generate a second text result.
대표청구항▼
What is claimed is: 1. A method of determining the accuracy of a speech recognition system comprising: providing a first graphical user interface (GUI) for selecting a transaction log wherein said transaction log has at least one entry, said entry specifying a speech recognition text result and a p
What is claimed is: 1. A method of determining the accuracy of a speech recognition system comprising: providing a first graphical user interface (GUI) for selecting a transaction log wherein said transaction log has at least one entry, said entry specifying a speech recognition text result and a plurality of corresponding attributes comprising a first attribute specifying a sound processing filter associated with said audio segment, a second attribute specifying a configuration of a speech recognition system generating said speech recognition text result, a third attribute specifying an acoustic model on which said speech recognition text result is based, and a fourth attribute specifying a linguistic model on which said speech recognition text result is based; and providing a second GUI for selecting at least one audio segment corresponding to said entry; wherein said second GUI comprises an activatable icon for initiating transcription of said audio segment through a reference speech recognition engine to generate a second text result. 2. The method of claim 1, wherein said second GUI comprises an input portion for receiving user corrected transcribed text. 3. The method of claim 1, further comprising: providing a third GUI, wherein said third GUI comprises one or more controls to associate said audio segment with at least one condition. 4. The method of claim 3, wherein said condition specifies at least a person having generated said audio segment, a gender of said person, and ambient sounds influencing a recognizability of said audio segment. 5. The method of claim 4, wherein said ambient sounds are at least one of weather generated sound and background noise. 6. The method of claim 3, wherein said condition is stored in said transaction log and associated with said entry. 7. The method of claim 1 wherein, said second GUI is automatically presented upon a transaction log being selected. 8. The method of claim 1, further comprising the step of providing a fourth GUI, wherein said fourth GUI comprises one or more indicators to show an operational status of a software application used in determining the accuracy of a speech recognition system. 9. The method of claim 1, further comprising providing a fifth GUI displaying said text result and said second text result. 10. The method of claim 9, wherein said fifth GUI further displays manually entered text corresponding to said audio segment. 11. The method of claim 9, wherein said fifth GUI further displays data. 12. The method of claim 11 wherein said data is statistical data. 13. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of: providing a first graphical user interface (GUI) for selecting a transaction log wherein said transaction log has at least one entry, said entry specifying a speech recognition text result and a plurality of corresponding attributes comprising a first attribute specifying a sound processing filter associated with said audio segment, a second attribute specifying a configuration of a speech recognition system generating said speech recognition text result, a third attribute specifying an acoustic model on which said speech recognition text result is based, and a fourth attribute specifying a linguistic model on which said speech recognition text result is based; and providing a second GUI for selecting at least one audio segment corresponding to said entry; wherein said second GUI comprises an activatable icon for initiating transcription of said audio segment through a reference speech recognition engine to generate a second text result. 14. The machine readable storage of claim 13, wherein said second GUI comprises an input portion for receiving user corrected transcribed text. 15. The machine readable storage of claim 13, further comprising: providing a third GUI, wherein said third GUI comprises one or more controls to associate said audio segment with at least one condition. 16. The machine readable storage of claim 15, wherein said condition specifies at least a person having generated said audio segment, a gender of said person, and ambient sounds influencing a recognizability of said audio segment. 17. The machine readable storage of claim 16, wherein said ambient sounds are at least one of weather generated sound and background noise. 18. The machine readable storage of claim 15, wherein said condition is stored in said transaction log and associated with said entry. 19. The machine readable storage of claim 13, said second GUI is automatically presented upon a transaction log being selected. 20. The machine readable storage of claim 13, further comprising the step of providing a fourth GUI, wherein said fourth GUI comprises one or more indicators to show an operational status of a software application used in determining the accuracy of a speech recognition system. 21. The machine readable storage of claim 13, further comprising providing a fifth GUI displaying said text result and said second text result. 22. The machine readable storage of claim 21, wherein said fifth GUI further displays manually entered text corresponding to said audio segment. 23. The machine readable storage of claim 21, wherein said fifth GUI further displays data. 24. The machine readable storage of claim 23, wherein said data is statistical data.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (9)
Doyle,Sean, Automatically improving a voice recognition system.
Young Jonathan Hood ; Parmenter David Wilsberg ; Roth Robert ; Dubach Joev ; Gadbois Gregory J. ; Van Even Stijn, Error correction in speech recognition.
Lewis James R. ; Ballard Barbara, Method and system for automatically determining whether to update a language model based upon user amendments to dictated text.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Adding information or functionality to a rendered document via association with an electronic counterpart.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Aggregate analysis of text captures performed by multiple users from rendered documents.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Association of a portable scanner with input/output and storage devices.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Determining actions involving captured information and electronic content associated with rendered documents.
Jablokov, Victor Roman; Jablokov, Igor Roditis; Terrell, II, James Richard; Paden, Scott Edward, Facilitating presentation by mobile device of additional content for a word or phrase upon utterance thereof.
Jablokov, Victor Roditis; Jablokov, Igor Roditis; Terrell, II, James Richard; White, Marc; Paden, Scott Edward, Facilitating presentation of ads relating to words of a message.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Handheld device for capturing text from both a document printed on paper and a document displayed on a dynamic display device.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
Marquette, Brian; Corfield, Charles; Espy, Todd, Method and systems for measuring user performance with speech-to-text conversion for dictation systems.
Marquette, Brian; Corfield, Charles; Espy, Todd, Method and systems for simplifying copying and pasting transcriptions generated from a dictation based speech-to-text system.
Marquette, Brian; Corfield, Charles; Espy, Todd, Method and systems for simplifying copying and pasting transcriptions generated from a dictation based speech-to-text system.
Jablokov, Victor Roman; Jablokov, Igor Roditis, Methods and systems for dynamically updating web service profile information by parsing transcribed message strings.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Methods and systems for initiating application processes by data capture from rendered documents.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
Zimmerman, Roger S.; Antunes, Christopher S.; Barron, Jeremy E.; Tomasulo, Sharon Lee; Fiore, Claudia W.; Johnson, Christopher E.; Khesin, Anatole; Miller, Joshua, Systems and methods for automated transcription training.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.