[특허]Compressed speech lexicon and method and apparatus for creating and accessing the speech lexicon

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-017/28 G06F-017/21
출원번호	US-0751871 (2000-12-29)
등록번호	US-7451075 (2008-11-11)
발명자 / 주소	Mohammed,Yunus
출원인 / 주소	Microsoft Corporation
대리인 / 주소	Kelly,Joseph R.
인용정보	피인용 횟수 : 2 인용 특허 : 38

초록 ▼

A compressed lexicon is built by receiving a word list, which includes word-dependent data associated with each word in the word list. A word is selected from the word list. A hash value is generated based on the selected word, and the hash value identifies an address in a hash table which, in turn,

A compressed lexicon is built by receiving a word list, which includes word-dependent data associated with each word in the word list. A word is selected from the word list. A hash value is generated based on the selected word, and the hash value identifies an address in a hash table which, in turn, is written with a location in lexicon memory that is to hold the compressed form of the selected word, and the compressed word-dependent data associated with the selected word. The word is then encoded, or compressed, as is its associated word-dependent data. This information is written at the identified location in the lexicon memory.

대표청구항 ▼

What is claimed is: 1. A method of building a compressed speech lexicon for use in a speech application, comprising: receiving a word list configured for use in the speech application, the word list including a plurality of words, with each word in the word list having associated word-dependent dat

What is claimed is: 1. A method of building a compressed speech lexicon for use in a speech application, comprising: receiving a word list configured for use in the speech application, the word list including a plurality of words, with each word in the word list having associated word-dependent data selected from the group consisting of a pronunciation and part-of-speech; selecting one of the words from the word list; generating an index entry identifying a location in a compressed speech lexicon memory for holding the selected word; encoding the selected word and its associated word-dependent data to obtain encoded words and associated encoded word-dependent data; and writing the encoded word and its associated word-dependent data at the identified location in the speech lexicon memory. 2. The method of claim 1 and further comprising: repeating the steps of selecting, generating, encoding and writing for each word in the word list and the associated word-dependent data. 3. The method of claim 2 and further comprising: writing codebooks corresponding to the encoded words and the encoded word-dependent data in the speech lexicon memory. 4. The method of claim 1 wherein receiving the word list comprises: counting the words in the word list; allocating a hash table memory based on a number of words in the word list; and allocating a speech lexicon memory based on the number of words in the word list. 5. The method of claim 1 wherein encoding comprises: providing a word encoder to encode the words in the word list and encoding the words with the word encoder; and providing word-dependent data encoders for each type of word-dependent data in the word list and encoding the word-dependent data with the word-dependent data encoders. 6. The method of claim 5 wherein encoding further comprises: Hufmann encoding the selected word and its associated word-dependent data. 7. The method of claim 1 wherein writing the encoded word and word-dependent data comprises: writing a data structure comprising: a word portion containing the encoded word; a word-dependent data portion containing the encoded word-dependent data; and wherein each word-dependent data portion has an associated last indicator portion and word-dependent data indicator portion, the last indicator portion containing an indication of a last portion of word-dependent data associated with the selected word, and the word-dependent data indicator portion containing an indication of the type of word-dependent data stored in the associated word dependent data portion. 8. The method of claim 7 wherein writing a data structure comprises writing the word portion and the word-dependent data portions as variable length portions followed by a separator. 9. The method of claim 1 wherein generating an index entry comprises: determining a next available location in the speech lexicon memory. 10. The method of claim 9 wherein generating an index entry comprises: calculating a hash value for the selected word; indexing into the hash table to an index location based on the hash value; and writing location data identifying the next available location in the speech lexicon memory into the index location in the hash table. 11. The method of claim 10 wherein writing location data comprises: writing an offset into the speech lexicon memory that corresponds to the next available location in the speech lexicon memory. 12. A method of accessing word information related to a word stored in a compressed speech lexicon, comprising: receiving the word; accessing an index to obtain a word location in the compressed speech lexicon that contains information associated with the received word including word-dependent data selected from the group consisting of a pronunciation and a part-of-speech; reading encoded word information from the word location; and decoding the word information for use in a speech application. 13. The method of claim 12 and further comprising: prior to reading the encoded word information, reading an encoded word from the word location; decoding the encoded word; and verifying that the decoded word is the same as the received word. 14. The method of claim 12 wherein decoding the word information comprises: initializing decoders associated with the word and its associated information. 15. The method of claim 12 wherein accessing an index comprises: calculating a hash value based on the received word; finding an index location in the index based on the hash value; and reading from the index location a pointer value pointing to the word location in the compressed lexicon. 16. The method of claim 12 wherein reading the encoded word information comprises: reading a plurality of fields from the word location containing variable length word information. 17. The method of claim 16 wherein reading a plurality of fields comprises: prior to reading each field, reading data type header information indicating a type of word information in an associated field. 18. The method of claim 17 wherein reading a plurality of fields comprises: reading a last field indicator indicating whether an associated one of the plurality of fields is a last field associated with the received word. 19. A compressed speech lexicon builder for building a compressed speech lexicon for use in a speech application based on a word list containing a plurality of domains, the domains including words and word-dependent data associated with each of the words, the compressed speech lexicon builder comprising: a plurality of domain encoders, one domain encoder being associated with each domain in the word list, the domain encoders being configured to compress the words and the associated word-dependent data selected from the group consisting of a pronunciation and a part-of-speech, to obtain compressed words and compressed word-dependent data; a hashing component configured to generate a hash value for each word in the word list; a hash table generator, coupled to the hashing component, configured to determine a next available location in a speech lexicon memory and write, at an address in a hash table identified by the hash value, the next available location in the speech lexicon memory; and a speech lexicon memory generator, coupled to the domain encoders and the hash table generator, configured to store in the speech lexicon memory, for use by the speech application, the compressed words and compressed word-dependent data, each compressed word and its associated compressed word-dependent data being stored at the next available location in the speech lexicon memory written in the hash table at the hash table address associated with the compressed word. 20. The compressed speech lexicon builder of claim 19 and further comprising: a codebook generator generating a codebook associated with each domain encoder.

LOADING...

이 특허에 인용된 특허 (38) 인용/피인용 타임라인 분석

Kaufman Ilia (Don Mills RI CAX) Kucera Henry (Providence RI), Apparatus and method for linguistic expression processing.
상세보기
Ogawa Tomoya,JPX, Apparatus and method for retrieving dictionary based on lattice as a key.
상세보기
Deligne Sabine ; Sagisaka Yoshinori,JPX ; Nakajima Hideharu,JPX, Apparatus for generating a statistical sequence model called class bi-multigram model with bigram dependencies assumed between adjacent sequences.
상세보기
Braden-Harder Lisa C. (Somers NY) Kim Michelle Y. L. (Scarsdale NY) Klavans Judith L. (Hastings-on-Hudson NY) Zadrozny Wlodek W. (Mohegan Lake NY), Archiving and retrieving multimedia objects using structured indexes.
상세보기
Mitchell John C.,GB2 ; Heard Alan James,GB2 ; Corbett Steven Norman,GB2 ; Daniel Nicholas John,GB2, Automated proofreading using interface linking recognized words to their audio data while text is being changed.
상세보기
Lewis G. Pringle ; Robert W. Swerdlow ; Alec Wysoker, Automated translation of annotated text based on the determination of locations for inserting annotation tokens and linked ending, end-of-sentence or language tokens.
상세보기
Shieber Stuart M. ; Armstrong John ; Baptista Rafael Jose ; Bentz Bryan A. ; Ganong ; III William F. ; Selesky Donald Bryant, Command parsing and rewrite system.
상세보기
Morgan Greene, Jr. ; Virginia Greene ; Harry E. Newman ; Mark J. Yuhas ; Michael F. Dorety, Electronic translator for assisting communications.
상세보기
Young Jonathan Hood ; Parmenter David Wilsberg ; Roth Robert ; Dubach Joev ; Gadbois Gregory J. ; Van Even Stijn, Error correction in speech recognition.
상세보기
Poirier Herve,FRX ; Tarbouriech Nelly,FRX ; Harrus Gilbert,FRX, Executable for requesting a linguistic service.
상세보기
Pazandak,Paul N.; Thompson,Craig, Guided natural language interface system and method.
상세보기
Peres, Renana; Shimoni, Guy, Interface to a speech processing system.
상세보기
Makhoul John I. ; Schwartz Richard M., Language-independent and segmentation-free optical character recognition system and method.
상세보기
Brew Christopher Hardie (Edinburgh GB6), Machine translation system utilizing bilingual equivalence statements.
상세보기
Nguyen John N. ; Marx Matthew T., Method and apparatus for continuous spelling speech recognition with early identification.
상세보기
Dahan Jean-Guy ; Gupta Vishwa,CAX, Method and apparatus for performing speech recognition utilizing a supplementary lexicon of frequently used orthographies.
상세보기
Erhart George W. ; Hartung Ronald L., Method and system for dynamic speech recognition using free-phone scoring.
상세보기
Daniel M. Coffman ; Popani Gopalakrishnan ; Ganesh N. Ramaswamy ; Jan Kleindienst CZ; Chalapathy V. Neti, Method and system for multi-client access to a dialog system.
상세보기
Burrows Michael, Method for parsing, indexing and searching world-wide-web pages.
상세보기
Burrows Michael, Method for parsing, indexing and searching world-wide-web pages.
상세보기
Koontz, Eugene, Method to compress linguistic structures.
상세보기
Wical Kelly, Methods and apparatus for dynamic classification of discourse.
상세보기
Abella Alicia ; Brown Michael Kenneth ; Buntschuh Bruce Melvin, Methods and apparatus object-oriented rule-based dialogue management.
상세보기
Beattie Valerie L. ; Miller David R. H. ; Edmondson Shawn Eric ; Patel Yogen N. ; Talvola Geoffrey A., Multi-dialect speech recognition method and apparatus.
상세보기
Johnson David Edward, Multimodal natural language interface for cross-application tasks.
상세보기
Loatman Robert B. (Vienna VA) Post Stephen D. (McLean VA) Yang Chih-King (Rockville MD) Hermansen John C. (Catharpin VA), Natural language understanding system.
상세보기
Wilson, Andrew T., Remote control with speech recognition.
상세보기
Comerford, Liam David; Fernhout, Paul Derek; Frank, David Carl, Scalable low resource dialog manager.
상세보기
Cliff Didcock GB, Shared text-to-speech resource.
상세보기
Schwartz Richard M. (Sudbury MA) Nguyen Long (Medford MA), Single tree method for grammar directed, very large vocabulary speech recognizer.
상세보기
Parthasarathy Sarangarajan ; Rosenberg Aaron Edward, Speaker identification with user-selected password phrases.
상세보기
Galler Michael ; Junqua Jean-Claude, Speech recognition system employing multiple grammar networks.
상세보기
Huang Xuedong D. ; Alleva Fileno A. ; Jiang Li ; Hwang Mei-Yuh, Speech recognition system for recognizing continuous and isolated speech.
상세보기
C. Scott Baker ; Charles T. Hemphill, System and method for adding speech recognition capabilities to java.
상세보기
Andrew J. Hunt ; William D. Walker ; Johan Wouters, System and method for interfacing speech recognition grammars to individual components of a computer program.
상세보기
Profit, Jr., Jack H.; Brown, N. Gregg; Mezey, Peter S.; Colombo, Lianne M., System and process for voice-controlled information retrieval.
상세보기
Henry C. A. Hyde-Thomson GB; Roger Liron GB, Unified messaging system with automatic language identification for text-to-speech conversion.
상세보기
Sarukkai Ramesh ; Sarukkai Sekhar, Web triggered word set boosting for speech interfaces to the world wide web.
상세보기

이 특허를 인용한 특허 (2) 인용/피인용 타임라인 분석

Hantler, Sidney L.; Laker, Meir M.; Lenchner, Jonathan; Milch, Daniel, Methods and apparatus for performing spelling corrections using one or more variant hash tables.
상세보기
Pouzin, Dominic, Two-pass hash extraction of text strings.
상세보기

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 특허

해당 특허가 속한 카테고리에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[미국특허] Compressed speech lexicon and method and apparatus for creating and accessing the speech lexicon 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (38) 인용/피인용 타임라인 분석

이 특허를 인용한 특허 (2) 인용/피인용 타임라인 분석

활용도 분석정보

활용도 Top5 특허

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[미국특허] Compressed speech lexicon and method and apparatus for creating and accessing the speech lexicon 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (38) 인용/피인용 타임라인 분석

이 특허를 인용한 특허 (2) 인용/피인용 타임라인 분석

활용도 분석정보

활용도 Top5 특허 더보기

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

활용도 Top5 특허