Method and system of ranking and clustering for document indexing and retrieval
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
출원번호
US-0724170
(2003-12-01)
등록번호
US-7496561
(2009-02-24)
발명자
/ 주소
Caudill,Maureen
Tseng,Jason Chun Ming
Wang,Lei
출원인 / 주소
Science Applications International Corporation
대리인 / 주소
Banner & Witcoff, Ltd.
인용정보
피인용 횟수 :
22인용 특허 :
62
초록▼
A relevancy ranking and clustering method and system that determines the relevance of a document relative to a user's query using a similarity comparison process. Input queries are parsed into one or more query predicate structures using an ontological parser. The ontological parser parses a set of
A relevancy ranking and clustering method and system that determines the relevance of a document relative to a user's query using a similarity comparison process. Input queries are parsed into one or more query predicate structures using an ontological parser. The ontological parser parses a set of known documents to generate one or more document predicate structures. A comparison of each query predicate structure with each document predicate structure is performed to determine a matching degree, represented by a real number. A multilevel modifier strategy is implemented to assign different relevance values to the different parts of each predicate structure match to calculate the predicate structure's matching degree. The relevance of a document to a user's query is determined by calculating a similarity coefficient, based on the structures of each pair of query predicates and document predicates. Documents are autonomously clustered using a self-organizing neural network that provides a coordinate system that makes judgments in a non-subjective fashion.
대표청구항▼
What is claimed is: 1. One or more computer readable media storing computer executable instructions to perform a method for vectorizing a set of document predicate structures, the method comprising: identifying at least one predicate and argument in said set of document predicate structures by a pr
What is claimed is: 1. One or more computer readable media storing computer executable instructions to perform a method for vectorizing a set of document predicate structures, the method comprising: identifying at least one predicate and argument in said set of document predicate structures by a predicate key that is an integer representation; estimating conceptual nearness of two of said document predicate structures in said set of document predicate structures by subtracting corresponding ones of said predicate keys; and outputting at least one document based upon the estimated conceptual nearness. 2. The computer readable media of claim 1, the method further comprising constructing multi-dimensional vectors using said integer representation. 3. The computer readable media of claim 2, the method further comprising normalizing said multi-dimensional vectors. 4. The computer readable media of claim 3, the method further comprising identifying at least one query predicate structure by a second predicate key that is a second integer representation, and constructing second multi-dimensional vectors, for said at least one query predicate structure, using said second integer representation. 5. The computer readable media of claim 1, the method further comprising identifying at least one query predicate structure by a second predicate key that is a second integer representation, and constructing second multi-dimensional vectors, for said at least one query predicate structure, using said second integer representation. 6. The computer readable media of claim 1, wherein said set of document predicate structures are representations of logical relationships between words in a sentence. 7. The computer readable media of claim 1, wherein each of said document predicate structures in said set includes a predicate and a set of arguments, wherein the predicate is one of a verb and a preposition. 8. One or more computer readable media storing computer executable instructions to perform a method for vectorizing a set of document predicate structures, the method comprising: identifying at least one predicate in said set of document predicate structures by a predicate key that is an integer representation; estimating conceptual nearness of two of said document predicate structures in said set of document predicate structures by subtracting corresponding ones of said predicate keys; and outputting at least one document based upon the estimated conceptual nearness. 9. The computer readable media of claim 8, the method further comprising constructing multi-dimensional vectors using said integer representation. 10. The computer readable media of claim 9, the method further comprising normalizing said multi-dimensional vectors. 11. The computer readable media of claim 10, the method further comprising identifying at least one query predicate structure by a second predicate key that is a second integer representation, and constructing second multi-dimensional vectors, for said at least one query predicate structure, using said second integer representation. 12. The computer readable media of claim 8, the method further comprising identifying at least one query predicate structure by a second predicate key that is a second integer representation, and constructing second multi-dimensional vectors, for said at least one query predicate structure, using said second integer representation. 13. The computer readable media of claim 8, wherein said set of document predicate structures are representations of logical relationships between words in a sentence. 14. One or more computer readable media storing computer executable instructions to perform a method for constructing multi-dimensional vector representations for each document of a set of documents, the method comprising: determining each predicate structure of one or more predicate structures M in each document of the set of documents, said M predicate structures including a predicate and at least one argument; identifying the predicate and the at least one argument in each of said M predicate structures by a predicate key that is an integer representation; determining a fixed number of arguments q for vector construction; constructing an N-dimensional vector representation of each document based upon the predicate and q arguments; and outputting at least one document of the set of documents based upon the constructed N-dimensional vector representation of the at least one document, wherein any predicate structure of said M predicate structures that includes less than q arguments fills unfilled argument positions with a numerical zero. 15. The computer readable media of claim 14, wherein any predicate structure of said M predicate structures that includes more than q arguments omits remaining arguments after q argument positions are filled. 16. The computer readable media of claim 15, wherein conceptual nearness of two of said N-dimensional vector representations is estimated by subtracting corresponding ones of said predicate keys. 17. The computer readable media of claim 15, the method further comprising normalizing said N-dimensional vector representations. 18. The computer readable media of claim 14, wherein conceptual nearness of two of said N-dimensional vector representations is estimated by subtracting corresponding ones of said predicate keys. 19. The computer readable media of claim 14, the method further comprising normalizing said N-dimensional vector representations.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (62)
Chang, Shih-Chio; Chow, Anita; Du, Min-Wen, Adaptive ranking system for information retrieval.
Braden-Harder Lisa ; Corston Simon H. ; Dolan William B. ; Vanderwende Lucy H., Apparatus and methods for an information retrieval system that employs natural language processing of search results to.
Pant Sangam ; Andre David L. ; Watson Gray ; Green Richard M. ; Schiegg Michael J., Computer system with user-controlled relevance ranking of search results.
Ogawa Yasushi (Yokohama JPX), Document retrieval system involving ranking of documents in accordance with a degree to which the documents fulfill a re.
Messerly John J. ; Heidorn George E. ; Richardson Stephen D. ; Dolan William B. ; Jensen Karen, Information retrieval utilizing semantic representation of text.
Hazlehurst Brian L. ; Burke Scott M. ; Nybakken Kristopher E., Intelligent query system for automatically indexing information in a database and automatically categorizing users.
Anglea Billy W. (Round Rock TX) Cox Robert Charles (Round Rock TX), Method and apparatus for character preprocessing which translates textual description into numeric form for input to a n.
Katz Boris (24A Garden St. Cambridge MA 02138) Winston Patrick H. (258 Sudbury Rd. Concord MA 01742), Method and apparatus for generating and utlizing annotations to facilitate computer text retrieval.
White Brian F. (Yorktown NY) Bretan Ivan P. (Lidingo SEX) Sanamrad Mohammad A. (Lidingo SEX), Method and apparatus for paraphrasing information contained in logical forms.
Katz Boris (24A Garden St. Cambridge MA 02138) Winston Patrick H. (88 Monument St. Concord MA 01742), Method and apparatus for utilizing annotations to facilitate computer retrieval of database material.
Black ; Jr. James E. (Schenectady NY) Zernik Uri (Schenectady NY), Method for natural language data processing using morphological and part-of-speech information.
Agrawal Rakesh ; Chakrabarti Soumen ; Dom Byron Edward ; Raghavan Prabhakar, Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values.
Hedin Erik B. (Lidingo SEX) Jonsson Gregor I. (Lidingo SEX) Olsson Lars E. (Kista SEX) Sanamrad Mohammad A. (Lidingo SEX) Westling Sven O. G. (Stockholm SEX), Natural language analyzing apparatus and method.
Liddy Elizabeth D. ; Paik Woojin ; Yu Edmund Szu-li, Natural language processing system for semantic vector representation which accounts for lexical ambiguity.
Loatman Robert B. (Vienna VA) Post Stephen D. (McLean VA) Yang Chih-King (Rockville MD) Hermansen John C. (Catharpin VA), Natural language understanding system.
Kaplan Craig A. (Santa Cruz CA) Chen James R. (Saratoga CA) Fallside David C. (San Jose CA) Fenwick Justine R. (Santa Cruz CA) Forcier Mitchell D. (Walnut Creek CA) Wolff Gregory J. (Mountain View CA, System for adjusting hypertext links with weighed user goals and activities.
Herz Frederick S. M. ; Eisner Jason M. ; Ungar Lyle H., System for generation of object profiles for a system for customized electronic identification of desirable objects.
Kanaegami Atsushi (Kamakura JPX) Koike Kazuhiro (Kamakura JPX) Taki Hirokazu (Kamakura JPX) Ohgashi Hitoshi (Kamakura JPX), Text search system for locating on the basis of keyword matching and keyword relationship matching.
Liddy Elizabeth D. ; Paik Woojin ; McKenna Mary E. ; Weiner Michael L. ; Yu Edmund S. ; Diamond Theodore G. ; Balakrishnan Bhaskaran ; Snyder David L., User interface and other enhancements for natural language information retrieval system and method.
Cormode, Graham; Korn, Philip Russell; Muthukrishnan, Shanmugavelayutham; Srivastava, Divesh, System and method for generating statistical descriptors for a data stream.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.