IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0365828
(2003-02-13)
|
발명자
/ 주소 |
|
출원인 / 주소 |
- Infoglide Software Corporation
|
대리인 / 주소 |
Taylor Russell & Russell, P.C.
|
인용정보 |
피인용 횟수 :
60 인용 특허 :
10 |
초록
▼
The invention provides a system and method for defining a schema and sending a query to a Similarity Search Engine to determine a quantitative assessment of the similarity of attributes between an anchor record and one or more target records. The Similarity Search Engine makes a similarity assessmen
The invention provides a system and method for defining a schema and sending a query to a Similarity Search Engine to determine a quantitative assessment of the similarity of attributes between an anchor record and one or more target records. The Similarity Search Engine makes a similarity assessment in a single pass through the target records having multiple relationship characteristics. The Similarity Search Engine is a server configuration that comprises a Gateway for command and response routing, a Virtual Document Manager for document generation, a Search Manager for document scoring, and an Relational Database Management System for providing data persistence, data retrieval and access to User Defined Functions. The Similarity Search Engine uses a unique command syntax based on the Extensible Markup Language to implement functions necessary for similarity searching and scoring.
대표청구항
▼
1. A method for performing similarity searching, comprising the steps of:receiving a request instruction from a client for initiating a similarity search;generating one or more query commands from the request instruction, each query command designating an anchor document and at least one search docu
1. A method for performing similarity searching, comprising the steps of:receiving a request instruction from a client for initiating a similarity search;generating one or more query commands from the request instruction, each query command designating an anchor document and at least one search document;executing each query command, including:computing a normalized document similarity score having a value of between 0.00 and 1.00 for each search document in each query command for indicating a degree of similarity between the anchor document and each search document;creating a result dataset containing the computed normalized document similarity scores for each search document; andsending a response including the result dataset to the client. 2. The method of claim 1, wherein the step of generating one or more query commands further comprises identifying a schema document for defining structure of search terms, mapping of datasets providing target search values to relational database locations, and designating measures, choices and weight to be used in a similarity search. 3. The method of claim 1, wherein the step of computing a normalized document similarity score comprises:computing attribute token similarity scores having values of between 0.00 and 1.00 for the corresponding leaf nodes of the anchor document and a search document using designated measure algorithms;multiplying each token similarity score by a designated weighting factor;aggregating the token similarity scores using designated choice algorithms for determining a document similarity score having a value of between 0.00 and 1.00 for the search document. 4. The method of claim 3, wherein:the step of computing attribute token similarity scores further comprises computing attribute token similarity scores in a relational database management system;the step of multiplying each token similarity score further comprises multiplying each token similarity score in a similarity search engine; andthe step of aggregating the token similarity scores further comprises aggregating the token similarity scores in the similarity search engine. 5. The method of claim 3, wherein the step of computing attribute token similarity scores having values of between 0.00 and 1.00 further comprises computing attribute token similarity scores having values of between 0.00 and 1.00, whereby a attribute token similarity value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 6. The method of claim 3, further comprising selecting measure algorithms from the group consisting of name equivalents, foreign name equivalents, textual, sound coding, string difference, numeric, numbered difference, ranges, numeric combinations, range combinations, fuzzy, date oriented, date to range, date difference, and date combination. 7. The method of claim 3, further comprising selecting choice algorithms from the group consisting of single best, greedy sum, overall sum, greedy minimum, overall minimum, and overall maximum. 8. The method of claim 1, wherein the step of generating one or more query commands comprises:populating an anchor document with search criteria values;identifying documents to be searched;defining semantics for overriding parameters specified in an associated schema document;defining a structure to be used by the result dataset; andimposing restrictions on the result dataset. 9. The method of claim 8, wherein the step of defining semantics comprises:designating overriding measures for determining attribute token similarity scores;designating overriding choice algorithms for aggregating token similarity scores into document similarity scores; anddesignating overriding weights to be applied to token similarity scores. 10. The method of claim 8, wherein the step of imposing restrictions is selected from the group consisting of defining a range of similarity indicia scores t o be selected and defining percentiles of similarity indicia scores to be selected. 11. The method of claim 1, wherein the step of computing a normalized document similarity score further comprises computing a normalized document similarity score having a value of between 0.00 and 1.00, whereby a normalized similarity indicia value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 12. The method of claim 1, wherein the step of generating one or more query commands further comprises generating one or more query commands whereby each query command includes attributes of command operation, name identification, and associated schema document identification. 13. The method of claim 1, further comprising:receiving a schema instruction from a client;generating a schema command document comprising the steps of:defining a structure of target search terms in one or more search documents;creating a mapping of database record locations to the target search terms;listing semantic elements for defining measures, weights and choices to be used in similarity searches; andstoring the schema command document into a database management system. 14. The method of claim 1, further comprising the step of representing documents and commands as hierarchical XML documents. 15. The method of claim 1, wherein the step of sending a response to the client further comprises sending a response including an error message and a warning message to the client. 16. The method of claim 1, wherein the step of sending a response to the client further comprises sending a response to the client containing the result datasets, whereby each result dataset includes at least one normalized document similarity score, at least one search document name, a path to the search documents having a returned score, and at least one designated schema. 17. The method of claim 1, further comprising:receiving a statistics instruction from a client;generating a statistics command from the statistics instruction, comprising the steps of:identifying a statistics definition to be used for generating statistics;populating an anchor document with search criteria values;identifying documents to be searched;delineating semantics for overriding measures, parsers and choices defined in a semantics clause in an associated schema document;defining a structure to be used by a result dataset;imposing restrictions to be applied to the result dataset;identifying a schema to be used for the basis of generating statistics;designating a name for the target statistics table for storing results;executing the statistics command for generating a statistics schema with statistics table, mappings and measures; andstoring the statistics schema in a database management system. 18. The method of claim 1, further comprising the step of executing a batch command comprising executing a plurality of commands in sequence for collecting results of several related operations. 19. A computer-readable medium containing instructions for controlling a computer system to implement the method of claim 1. 20. A system for performing similarity searching, comprising:a gateway for receiving a request instruction from a client for initiating a similarity search;the gateway for generating one or more query commands from the request instruction, each query command designating an anchor document and at least one search document;a search manager for executing each query command, including:means for computing a normalized document similarity score having a value of between 0.00 and 1.00 for each search document in each query command for indicating a degree of similarity between the anchor document and each search document;means for creating a result dataset containing the computed normalized document similarity scores for each search document; andthe gateway for sending a response including the result dataset to th e client. 21. The system of claim 20, wherein the means for computing a normalized similarity score comprises:a relational database management system for computing attribute token similarity scores having values of between 0.00 and 1.00 for the corresponding leaf nodes of the anchor document and a search document using designated measure algorithms; andthe search manager for multiplying each token similarity score by a designated weighting factor and aggregating the token similarity scores using designated choice algorithms for determining a document similarity score having a value of between 0.00 and 1.00 for the search document. 22. The system of claim 21, wherein the relational database management system includes means for computing an attribute token similarity score having a value of between 0.00 and 1.00, whereby a token similarity indicia value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 23. The system of claim 21, wherein the measure algorithms are selected from the group consisting of name equivalents, foreign name equivalents, textual, sound coding, string difference, numeric, numbered difference, ranges, numeric combinations, range combinations, fuzzy, date oriented, date to range, date difference, and date combination. 24. The system of claim 21, wherein the choice algorithms are selected from the group consisting of single best, greedy sum, overall sum, greedy minimum, overall minimum, and overall maximum. 25. The system of claim 20, wherein:each one or more query commands further comprises a measure designation; andthe database management system further comprises designated measure algorithms for computing a token similarity score. 26. The system of claim 20, wherein each query command comprises:an anchor document populated with search criteria values;at least one search document;designated measure algorithms for determining token similarity scores;designated choice algorithms for aggregating token similarity scores into document similarity scores;designated weights for weighting token similarity scores;restrictions to be applied to a result dataset document; anda structure to be used by the result dataset. 27. The system of claim 20, wherein the computed document similarity scores have a value of between 0.00 and 1.00, whereby a normalized similarity indicia value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 28. The system of claim 20, wherein each query command includes attributes of command operation, name identification, and associated schema document identification for providing a mapping of search documents to database management system locations. 29. The system of claim 20, further comprising:the gateway for receiving a schema instruction from a client;a virtual document manager for generating a schema command document;the schema command document comprising:a structure of target search terms in one or more search documents;a mapping of database record locations to the target search terms;semantic elements for defining measures, weights, and choices for use in searches; anda relational database management system for storing the schema command document. 30. The system of claim 20, wherein each result dataset includes at least one normalized document similarity score, at least one search document name, a path to the search documents having a returned score and at least one designated schema. 31. The system of claim 20, wherein each result dataset includes an error message and a warning message to the client. 32. The system of claim 20, further comprising:the gateway for receiving a statistics instruction from a client and for generating a statistics command from the statistics instruction;the search manager for identifying a statistics definition to be used f or generating statistics, populating an anchor document with search criteria values, identifying documents to be searched, delineating semantics for overriding measures, weights and choices defined in a semantics clause in an associated schema document, defining a structure to be used by a result dataset, imposing restrictions to be applied to the result dataset, identifying a schema to be used for the basis of generating statistics, designating a name for the target statistics table for storing results; anda statistics processing module for executing the statistics command for generating a statistics schema with statistics table, mappings and measures, and storing the statistics schema in a database management system. 33. The system of claim 20, further comprising the gateway for receiving a batch command from a client for executing a plurality of commands in sequence for collecting results of several related operations.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.