Method and system for extending keyword searching to syntactically and semantically annotated data
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/27
G06F-007/00
G06F-017/30
출원번호
US-0401421
(2009-03-10)
등록번호
US-8131540
(2012-03-06)
발명자
/ 주소
Marchisio, Giovanni B.
Koperski, Krzysztof
Liang, Jisheng
Nguyen, Thien
Tusk, Carsten
Dhillon, Navdeep S.
Pochman, Lubos
Brown, Matthew E.
출원인 / 주소
Evri, Inc.
대리인 / 주소
Lowe Graham Jones PLLC
인용정보
피인용 횟수 :
35인용 특허 :
47
초록▼
Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as informat
Methods and systems for extending keyword searching techniques to syntactically and semantically annotated data are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set as an enhanced document index with document terms as well as information pertaining to the grammatical roles of the terms and ontological and other semantic information. In one embodiment, the enhanced document index is a form of term-clause index, that indexes terms and syntactic and semantic annotations at the clause level. The enhanced document index permits the use of a traditional keyword search engine to process relationship queries as well as to process standard document level keyword searches. In one embodiment, the SQE comprises a Query Processor, a Data Set Preprocessor, a Keyword Search Engine, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface or an application programming interface.
대표청구항▼
1. A method in a computer system for performing a relationship search of a corpus of documents, each document having at least one sentence, comprising: receiving a relationship search query that designates a desired grammatical relationship between a first entity and at least one of a second entity
1. A method in a computer system for performing a relationship search of a corpus of documents, each document having at least one sentence, comprising: receiving a relationship search query that designates a desired grammatical relationship between a first entity and at least one of a second entity or an action;transforming the search query into a Boolean expression;under control of the computer system, automatically determining a set of data objects that match the Boolean expression using a keyword-style search of a data structure that indexes terms of the documents in a memory of the computer system by including, for at least some of a plurality of terms, grammatical relationship information that specifies that the corresponding term is a subject, object, or modifier of another term, and including for at least one of the plurality of terms having the included grammatical relationship information, semantic information that specifies an entity type that identifies the term as a type of person, location, or thing;when the received relationship search query designates a desired grammatical relationship between the first entity and any action, returning an indication of a plurality of matching objects in the corpus that encompass the first entity along with an indication of the corresponding action encompassed by the matching objects; andotherwise, returning an indication of a plurality of matching objects in the corpus that encompass the desired grammatical relationship. 2. The method of claim 1 wherein the automatically determining the set of data objects determines objects that are at least one of clauses, sentences, paragraphs, or documents. 3. The method of claim 1 wherein the data structure stores the grammatical relationship information and semantic information as additional terms of the documents. 4. The method of claim 1 wherein the designated at least one second entity or the action indicates a desire to match any second entity. 5. The method of claim 4, each sentence of each document comprising at least one clause, wherein the any second entity is any term used as a subject of a clause of a sentence. 6. The method of claim 4, each sentence of each document comprising at least one clause, wherein the any second entity is any term used as an object of a clause of a sentence. 7. The method of claim 1 wherein the first entity is any term that matches a specified entity type or ontology path specification. 8. The method of claim 1 wherein the designated at least one second entity or the action is a verb and wherein the returning the indication of the plurality of matching objects that encompasses the desired relationship returns indications to objects that contain similar verbs to the designated verb. 9. The method of claim 1 wherein the designated at least one second entity or the action indicates a desire to match any action and a desire to match any second entity. 10. The method of claim 1 wherein the receiving the relationship search query that designates the desired grammatical relationship between the first entity and at least one of the second entity or the action specifies at least one of a prepositional constraint, a document keyword constraint, or a document metadata constraint. 11. The method of claim 1 wherein the relationship search query includes a Boolean operation. 12. The method of claim 1 wherein the relationship search query includes an operator that specifies at least one of a proximity, a range, a wildcard, a weighted search based upon frequency, or a weighted keyword search operation. 13. The method of claim 1 wherein the relationship search query includes a designation of at least one entity type or a path specification in a classification system. 14. The method of claim 1 wherein the relationship search query includes a wildcard specification in the designation of the desired grammatical relationship. 15. The method of claim 1 wherein the transforming the search query to generate a Boolean expression incorporates transformational grammar rules to generate related grammatical relationships to search for. 16. A computer-readable memory medium containing instructions that control a computer processor to search a corpus of documents, each document having at least one sentence, by performing a method comprising: receiving a relationship search query that designates a desired grammatical relationship between a first entity and at least one of a second entity or an action;transforming the search query into a Boolean expression;determining a set of data objects that match the Boolean expression using a keyword-style search of a data structure that indexes terms of the documents by including, for at least some of a plurality of the terms, grammatical relationship information that specifies that the corresponding term is a subject, object, or modifier of another term, and including for at least one of the plurality of terms having the included grammatical relationship information, semantic information that specifies an entity type that identifies the term as a type of person, location, or thing;when the received relationship search query designates a desired grammatical relationship between the first entity and any action, returning an indication of a plurality of matching objects in the corpus that encompasses the first entity along with an indication of the corresponding action encompassed by the matching objects; andotherwise, returning an indication of a plurality of matching objects in the corpus that encompass the desired relationship. 17. The memory medium of claim 16 wherein the determined data objects are at least one of clauses, sentences, paragraphs, or documents. 18. The memory medium of claim 16 wherein the data structure stores the grammatical relationship information and the semantic information as additional terms of the documents. 19. The memory medium of claim 16 wherein the designated at least one second entity or the action indicates a desire to match any second entity. 20. The memory medium of claim 16 wherein the first entity is any term that matches a specified entity type or ontology path specification. 21. The memory medium of claim 16 wherein the designated at least one second entity or the action is a verb and the returning the indication of the plurality of matching objects that encompass the desired relationship returns indications to objects that contain similar verbs to the designated verb. 22. The memory medium of claim 16 wherein the designated desired grammatical relationship specifies at least one of a prepositional constraint, a document keyword constraint, or a document metadata constraint. 23. The memory medium of claim 16 wherein the search query includes an operator that specifies at least one of a proximity, a range, a wildcard, a weighted search based upon frequency, or a weighted keyword search operation. 24. The memory medium of claim 16 wherein the data structure is a reverse index of terms that indexes at least one of documents, sentences, or clauses. 25. A relationship search engine that searches a corpus of documents, each document having at least one sentence, comprising: a memory;a data structure that is configured to index and store in the memory terms of the documents along with annotations that include relationship information, each annotation associated with at least one term, wherein the relationship information stored with at least a corresponding one of the terms specifies an entity type that identifies the corresponding term as a type of person, place, or thing;a keyword search engine that is configured, when executed on a computer processor, to perform pattern matches of an input string against the data structure and return an indication of a plurality of matching objects of the corpus; anda query processor that is configured, when executed on a computer processor, to receive a relationship search query that is indicative of at least one syntactically or semantically annotated term;transform the relationship search query into at least one Boolean expression;invoke the keyword search engine to determine objects that match the at least one Boolean expression by pattern matching the at least one annotated term indicated by the search query to the data structure, such that each matching object encompasses the relationship specified by the relationship search;when the received relationship search query designates a desired grammatical relationship between a first entity and any action, return an indication of a plurality of matching objects in the corpus that encompasses the first entity along with an indication of the corresponding action encompassed by the matching objects; andotherwise, return an indication of a plurality of matching objects in the corpus that encompass the desired relationship. 26. The relationship search engine of claim 25 wherein the returned indications indicate at least one of clauses, sentences, paragraphs, or documents. 27. The relationship search engine of claim 25 wherein the data structure stores the relationship information as additional terms of the documents. 28. The relationship search engine of claim 25 wherein the annotations that include relationship information denote a grammatical role of each associated term. 29. The relationship search engine of claim 25 wherein the annotations denote semantic tags associated with the terms. 30. The relationship search engine of claim 25 wherein the data structure is a reverse index of terms that indexes at least one of documents, sentences, or clauses. 31. The relationship search engine of claim 30 wherein the reverse index of terms comprises a plurality of reverse indices of terms. 32. The relationship search engine of claim 25 wherein the data structure is at least one of a term-document matrix, a term-sentence matrix, or a term-clause matrix. 33. The relationship search engine of claim 25, the data structure further configured to store and index the terms of the documents with the annotations across a plurality of storage repositories, and wherein the keyword search engine is further configured to perform pattern match searches of the input string against each storage repository that contains a portion of the index and merge the results of the pattern match searches to return an indication to each matching object in the corpus that encompasses the desired relationship. 34. The relationship search engine of claim 33 wherein the pattern match searches of the input string against each storage repository that contains the portion of the index are performed using parallel processing techniques.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (47)
Braden-Harder Lisa ; Corston Simon H. ; Dolan William B. ; Vanderwende Lucy H., Apparatus and methods for an information retrieval system that employs natural language processing of search results to.
Deerwester Scott C. (Chicago IL) Dumais Susan T. (Berkeley Heights NJ) Furnas George W. (Madison NJ) Harshman Richard A. (London NJ CAX) Landauer Thomas K. (Summit NJ) Lochbaum Karen E. (Chatham NJ) , Computer information retrieval using latent semantic structure.
Vaithyanathan Shivakumar ; Adler Mark R. ; Hill Christopher G., Computer method and apparatus for clustering documents and automatic generation of cluster keywords.
Lamberti Donna M. (Medfield MA) Prager John M. (Sharon MA) Nappari Mark A. (Arlington MA), Constrained natural language interface for a computer that employs a browse function.
Appelt, Douglas E.; Arnold, James Frederick; Bear, John S.; Hobbs, Jerry Robert; Israel, David J.; Kameyama, Megumi; Martin, David L.; Myers, Karen Louise; Ravichandran, Gopalan; Stickel, Mark Edward, Information retrieval by natural language querying.
Messerly John J. ; Heidorn George E. ; Richardson Stephen D. ; Dolan William B. ; Jensen Karen, Information retrieval utilizing semantic representation of text and based on constrained expansion of query words.
Marchisio,Giovanni B.; Koperski,Krzysztof; Liang,Jisheng; Murua,Alejandro; Nguyen,Thien; Tusk,Carsten; Dhillon,Navdeep S.; Pochman,Lubos, Method and system for enhanced data searching.
Black ; Jr. James E. (Schenectady NY) Zernik Uri (Schenectady NY), Method for natural language data processing using morphological and part-of-speech information.
Dumais Susan T. ; Heckerman David ; Horvitz Eric ; Platt John Carlton ; Sahami Mehran, Methods and apparatus for classifying text and for building a text classifier.
Bowman Dwayne E. ; Ortega Ruben E. ; Hamrick Michael L. ; Spiegel Joel R. ; Kohn Timothy R., Refining search queries by the suggestion of correlated terms from prior searches.
Arnold, James F.; Israel, David J.; Tyson, W. Mabry; Bear, John S.; Voss, Loren L., System, method and article of manufacture for concept based information searching.
Liddy Elizabeth D. ; Paik Woojin ; McKenna Mary E. ; Weiner Michael L. ; Yu Edmund S. ; Diamond Theodore G. ; Balakrishnan Bhaskaran ; Snyder David L., User interface and other enhancements for natural language information retrieval system and method.
Caid William Robert ; Carleton Joel Lawrence, Visualization of information using graphical representations of context vector based relationships and attributes.
Beller, Charles E.; Dubyak, William G., Candidate answer generation for explanatory questions directed to underlying reasoning regarding the existence of a fact.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.