[특허]Method and system for automatically building natural language understanding models

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G10L-015/00 G10L-015/18 G10L-021/00
출원번호	UP-0324057 (2005-12-30)
등록번호	US-7835911 (2011-01-16)
발명자 / 주소	Balchandran, Rajesh Boyer, Linda M.
출원인 / 주소	Nuance Communications, Inc.
대리인 / 주소	Wolf, Greenfield & Sacks, P.C.
인용정보	피인용 횟수 : 17 인용 특허 : 25

초록 ▼

The invention disclosed herein concerns a system (100) and method (600) for building a language model representation of an NLU application. The method 500 can include categorizing an NLU application domain (602), classifying a corpus in view of the categorization (604), and training at least one lan

The invention disclosed herein concerns a system (100) and method (600) for building a language model representation of an NLU application. The method 500 can include categorizing an NLU application domain (602), classifying a corpus in view of the categorization (604), and training at least one language model in view of the classification (606). The categorization produces a hierarchical tree of categories, sub-categories and end targets across one or more features for interpreting one or more natural language input requests. During development of an NLU application, a developer assigns sentences of the NLU application to categories, sub-categories or end targets across one or more features for associating each sentence with desire interpretations. A language model builder (140) iteratively builds multiple language models for this sentence data, and iteratively evaluating them against a test corpus, partitioning the data based on the categorization and rebuilding models, so as to produce an optimal configuration of language models to interpret and respond to language input requests for the NLU application.

대표청구항 ▼

What is claimed is: 1. A method for building a language model configuration comprising the steps of: categorizing a natural language understanding (NLU) application to produce an application categorization having a plurality of categories; classifying a corpus of example expressions to produce a cl

What is claimed is: 1. A method for building a language model configuration comprising the steps of: categorizing a natural language understanding (NLU) application to produce an application categorization having a plurality of categories; classifying a corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; and operating at least one computer configured with a plurality of instructions that, when executed, cause the at least one computer to train at least one statistical language model using said classified corpus by: building from the classified corpus a first language model configuration comprising a first statistical language model; evaluating an interpretation accuracy of the first language model configuration using test data; determining whether the evaluated interpretation accuracy of the first language model configuration is less than a desired accuracy; when it is determined that the evaluated interpretation accuracy of the first language model configuration is at least the desired accuracy, then adopting the first language model configuration; and when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy, then: sub-dividing the application categorization into a plurality of sub-categories; and building a second language model configuration comprising a plurality of statistical language models corresponding to the plurality of sub-categories. 2. The method of claim 1, wherein categorizing the natural language understanding (NLU) application to produce an application categorization having a plurality of categories comprises representing the application as a hierarchical tree of categories, sub-categories and end targets in the application categorization for one or more features or types of interpretation. 3. The method of claim 2, wherein operating the at least one computer to train the statistical language model comprises causing the statistical language model to learn associations between, categories, sub-categories and end targets across multiple features within said application categorization, such that a language input request is identified, using a statistical language model, with at least one action that corresponds to a target. 4. The method of claim 2, further comprising producing a visual representation of said application categorization for visually categorizing and visually classifying said natural language understanding application. 5. The method of claim 4, wherein visually classifying includes dragging and dropping a sentence of said corpus into at least one target of said visual representation. 6. The method of claim 4, further comprising automatically classifying an example sentence in category targets above a node in the visual representation when a developer classifies said example sentence in said node. 7. The method of claim 1, wherein identifying at least one of the categories for each example expression of the example expressions comprises identifying at least one target in said application categorization to provide a correct interpretation of the example expression. 8. The method of claim 1, wherein: said classifying further comprises associating an example expression with multiple targets corresponding to one or more features; and the method further comprises providing to a user of said natural language understanding application who has entered a language input request multiple pieces of information from said language model representation. 9. The method of claim 1, wherein the method further comprises interpreting a language input request at runtime using the at least one statistical language model in a defined sequence that provides the highest trained accuracy of natural language interpretation. 10. The method of claim 1, wherein: sub-dividing the application categorization into a plurality of sub-categories comprises sub-dividing the application categorization into a plurality of sub-categories at least one branch within said application categorization; building a second language model configuration comprising a plurality of statistical language models comprises: building a statistical language model for each of said at least one branch corresponding to the plurality of sub-categories; and saving a configuration file describing a sequential interconnection of each statistical language model of the plurality of statistical language models; and the method further comprises evaluating the interpretation accuracy of the second language model configuration by passing sentences of test data through said statistical language models of the second language model configuration in a sequence described by said configuration file. 11. The method of claim 10, further comprising logging a history of a performance accuracy for each new configuration, comparing a historic performance accuracy of a previous configuration to a new performance accuracy of said new configuration, and reverting to said historic configuration if said new performance accuracy is less than said historic performance accuracy, wherein if no further partitioning is possible, then of the previous partitions, the model configuration that yielded the best performance accuracy on the test data is considered as the optimum model configuration. 12. The method of claim 1, wherein during a runtime, a user request is first submitted to a high language model for which the high language model produces a response, and if the response is an end target then the high language model has sufficiently interpreted the user request and no lower language models are employed to interpret the user request, whereas if the response of the high level language model is a category having its own set of end targets then a lower level language model is further accessed for interpreting the user request. 13. The method of claim 1, wherein the plurality of sub-categories is a first plurality of sub-categories and the plurality of statistical language models is a first plurality of statistical language models, and training at least one statistical language model further comprises, when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy: evaluating an interpretation accuracy of the second language model configuration using the test data; determining whether the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy; when it is determined that the evaluated interpretation accuracy of the second language model configuration is at least the desired accuracy, then adopting the second language model configuration; and when it is determined that the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy, then: further sub-dividing the first plurality of sub-categories of the application categorization into a second plurality of sub-categories; and building a third language model configuration comprising a second plurality of statistical language models corresponding to the second plurality of sub-categories. 14. A natural language understanding (NLU) system comprising: a computer comprising at least one processor configured to perform acts of: categorizing an NLU application to produce an application categorization having a plurality of categories; classifying an NLU database corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; and building from the classified corpus a first language model configuration comprising a first statistical language model; evaluating an interpretation accuracy of the first language model configuration using test data; determining whether the evaluated interpretation accuracy is less than a desired accuracy; when it is determined that the evaluated interpretation accuracy is at least the desired accuracy, then adopting the first language model configuration; and when it is determined that the evaluated interpretation accuracy is less than the desired accuracy, then: sub-dividing the application categorization into a plurality of sub-categories; and building a second language model configuration comprising a plurality of statistical language models corresponding to the plurality of sub-categories. 15. The NLU system of claim 14, wherein said application categorization includes at least one topic of said natural language understanding application that is presented as one of a category, sub-category and end target for one or more features. 16. The NLU system of claim 14, wherein said classifying partitions said NLU database corpus based on said application categorization. 17. The NLU system of claim 14, further comprising: a visual toolkit having a graphical user interface (GUI) for visually presenting said application categorization and classifying said NLU database corpus. 18. The NLU system of claim 14, wherein a language input request is processed through at least one language model configuration for yielding a classification result, wherein a target with the highest classification result is selected to provide information in response to the language input request. 19. The NLU system of claim 14, wherein the plurality of sub-categories is a first plurality of sub-categories and the plurality of statistical language models is a first plurality of statistical language models, and when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy, the computer comprising the at least one processor is further configured to perform acts of: evaluating an interpretation accuracy of the second language model configuration using the test data; determining whether the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy; when it is determined that the evaluated interpretation accuracy of the second language model configuration is at least the desired accuracy, then adopting the second language model configuration; and when it is determined that the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy, then: further sub-dividing the first plurality of sub-categories of the application categorization into a second plurality of sub-categories; and building a third language model configuration comprising a second plurality of statistical language models corresponding to the second plurality of sub-categories. 20. A natural language understanding (NLU) system comprising: a computer comprising at least one processor configured to perform acts of: categorizing an NLU application to produce an application categorization having a plurality of categories; classifying an NLU database corpus of example expressions to produce a classified corpus by identifying at least one of the categories for each example expression of the example expressions; building from the classified corpus a first language model configuration comprising a first statistical language model; evaluating an interpretation accuracy of the first language model configuration using test data; determining whether the evaluated interpretation accuracy of the first language model configuration is less than a desired accuracy; when it is determined that the evaluated interpretation accuracy of the first language model configuration is at least the desired accuracy, then adopting the first language model configuration; and when it is determined that the evaluated interpretation accuracy of the first language model configuration is less than the desired accuracy, then: sub-dividing the application categorization into a first plurality of sub-categories; building a second language model configuration comprising a first plurality of statistical language models corresponding to the first plurality of sub-categories; evaluating an interpretation accuracy of the second language model configuration using the test data; determining whether the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy; when it is determined that the evaluated interpretation accuracy of the second language model configuration is at least the desired accuracy, then adopting the second language model configuration; and when it is determined that the evaluated interpretation accuracy of the second language model configuration is less than the desired accuracy, then: determining whether the first plurality of sub-categories can be further sub-divided; when it is determined that the first plurality of sub-categories can be further sub-divided, then: further sub-dividing the first plurality of sub-categories of the application categorization into a second plurality of sub-categories; and building a third language model configuration comprising a second plurality of statistical language models corresponding to the second plurality of sub-categories; and when it is determined that the first plurality of sub-categories cannot be further sub-divided, then adopting a language model configuration selected from the group consisting of the first language model configuration and the second language model configuration, the selected language model configuration having the greatest evaluated interpretation accuracy.

LOADING...

이 특허에 인용된 특허 (25) 인용/피인용 타임라인 분석

Ramaswamy Ganesh N. ; Printz Harry W. ; Gopalakrishnan Ponani S., Apparatus and method for building domain-specific language models.
상세보기
Sproat Richard William, Compilation of weighted finite-state transducers from decision trees.
상세보기
Epstein, Mark E., Creating a hierarchical tree of language models for a dialog system based on prompt and dialog context.
상세보기
Huang,Xuedong D.; Mahajan,Milind V.; Wang,Ye Yi; Mou,Xiaolong, Creating a language model for a language processing system.
상세보기
Peter C. Monaco, Creating and editing grammars for speech recognition graphically.
상세보기
Strong Robert D. (San Jose CA), Dynamic categories for a speech recognition system.
상세보기
Lavi,Ofer; Auerbach,Gadiel; Persky,Eldad, Dynamic natural language understanding.
상세보기
Wang,Ye Yi; Acero,Alejandro, Grammar authoring system.
상세보기
Yoshii Hiroto,JPX ; Arai Tsunekazu,JPX ; Takasu Eiji,JPX, Information processing method, information processing apparatus, and storage medium.
상세보기
Milind V. Mahajan ; Xuedong D. Huang, Information retrieval and speech recognition based on language models.
상세보기
Veprek, Peter; Applebaum, Ted H.; Pearson, Steven; Kuhn, Roland, Intermediary speech processor in network environments transforming customized speech parameters.
상세보기
Attwater, David J.; Edgington, Michael D.; Durston, Peter J., Learning of dialogue states and language model of spoken information system.
상세보기
Ramaswamy Ganesh N. ; Kleindienst Jan,CZX, Method and system for hierarchical natural language understanding.
상세보기
Kanevsky, Dimitri; Yashchm, Emmanuel, Method and system for off-line detection of textual topical changes and topic identification via likelihood based methods for improved language modeling.
상세보기
Dori,Dov, Modeling system.
상세보기
Hedin Erik B. (Lidingo SEX) Jonsson Gregor I. (Lidingo SEX) Olsson Lars E. (Kista SEX) Sanamrad Mohammad A. (Lidingo SEX) Westling Sven O. G. (Stockholm SEX), Natural language analyzing apparatus and method.
상세보기
Dahlgren Kathleen ; Stabler Edward, Natural language understanding system.
상세보기
Thelen, Eric; Besling, Stefan; Ullrich, Meinhard, Speech recognition system having parallel large vocabulary recognition engines.
상세보기
Wang, Hai-Feng; Huang, Chang-Ning; Lee, Kai-Fu; Di, Shuo; Gao, Jianfeng; Cai, Dong-Feng; Chien, Lee-Feng, System and iterative method for lexicon, segmentation and language model joint optimization.
상세보기
Kendall Daythal Lee ; Wadsworth Dennis Lee ; Bouzid Ahmed Tewfik ; Dahl Deborah Anna ; Hua Hua, System and method for creating a language grammar using a spreadsheet or table interface.
상세보기
Marx Matthew T. ; Carter Jerry K. ; Phillips Michael S. ; Holthouse Mark A. ; Seabury Stephen D. ; Elizondo-Cecenas Jose L. ; Phaneuf Brett D., System and method for developing interactive speech applications.
상세보기
Ponceleon, Dulce Beatriz; Srinivasan, Savitha, System and method for the automatic discovery of salient segments in speech transcripts.
상세보기
Miller,Edward S.; Blake, II,James F.; Herold,Keith C.; Bergman,Michael D.; Danielson,Kyle N.; Auckland,Alexandra L., System and method for tuning and testing in a speech recognition system.
상세보기
Baker Janet M. (West Newton MA) Gillick Laurence S. (Newton MA) Baker James K. (West Newton MA) Yamron Jonathan P. (Sudbury MA), Systems and methods for word recognition.
상세보기
McCarthy,Daniel J.; Natarajan,Premkumar, Unsupervised training in natural language call routing.
상세보기

이 특허를 인용한 특허 (17) 인용/피인용 타임라인 분석

Sarikaya, Ruhi; Xu, Puyang; Rochette, Alexandre; Celikyilmaz, Asli, Contextual language understanding for multi-turn language tasks.
상세보기
Sarikaya, Ruhi; Xu, Puyang; Rochette, Alexandre; Celikyilmaz, Asli, Contextual language understanding for multi-turn language tasks.
상세보기
Chang, Walter W.; Welch, Michael J., Custom language models for audio content.
상세보기
Burges, Chris J. C.; Pastusiak, Andrzej, Hierarchical models for language modeling.
상세보기
Balchandran, Rajesh; Boyer, Linda M.; Purdy, Gregory, Information extraction in a natural language understanding system.
상세보기
Balchandran, Rajesh; Boyer, Linda M.; Purdy, Gregory, Information extraction in a natural language understanding system.
상세보기
Levit, Michael; Parthasarathy, Sarangarajan; Stolcke, Andreas, Language model optimization for in-domain application.
상세보기
Mauro, David Andrew; Gandrabur, Simona, Method and system for facilitating communications for a user transaction.
상세보기
Balchandran, Rajesh; Boyer, Linda M.; Purdy, Gregory, Reclassification of training data to improve classifier accuracy.
상세보기
Biadsy, Fadi; Moreno Mengibar, Pedro J.; Nakajima, Kaisuke; Bikel, Daniel Martin, Sampling training data for an automatic speech recognition system based on a benchmark classification distribution.
상세보기
Bangalore, Srinivas; Bell, Robert; Caseiro, Diamantino Antonio; Gilbert, Mazin; Haffner, Patrick, System and method for rapid customization of speech recognition models.
상세보기
Bangalore, Srinivas; Bell, Robert; Caseiro, Diamantino Antonio; Gilbert, Mazin; Haffner, Patrick, System and method for rapid customization of speech recognition models.
상세보기
Lundberg, Sonja Petrovic; Aili, Eric; Wieweg, Andreas; Jonsson, Rebecca; Hjelm, David, System and methods for semiautomatic generation and tuning of natural language interaction applications.
상세보기
Lundberg, Sonja Petrovic; Aili, Eric; Wieweg, Andreas; Jonsson, Rebecca; Hjelm, David, System and methods for semiautomatic generation and tuning of natural language interaction applications.
상세보기
Estes, Timothy Wayne; Gardner, James Johnson; Russell, Matthew; Michalak, Phillip Daniel, Systems and methods for construction, maintenance, and improvement of knowledge representations.
상세보기
Levit, Michael; Parthasarathy, Sarangarajan; Stolcke, Andreas; Chang, Shuangyu, Token-level interpolation for class-based language models.
상세보기
Kannan, Vishwac Sena; Uzelac, Aleksandar; Hwang, Daniel J., Updating language understanding classifier models for a digital personal assistant based on crowd-sourcing.
상세보기

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 특허

해당 특허가 속한 카테고리에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[미국특허] Method and system for automatically building natural language understanding models 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (25) 인용/피인용 타임라인 분석

이 특허를 인용한 특허 (17) 인용/피인용 타임라인 분석

활용도 분석정보

활용도 Top5 특허

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[미국특허] Method and system for automatically building natural language understanding models 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (25) 인용/피인용 타임라인 분석

이 특허를 인용한 특허 (17) 인용/피인용 타임라인 분석

활용도 분석정보

활용도 Top5 특허 더보기

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

활용도 Top5 특허