[특허]Similarity search engine for use with relational databases

Similarity search engine for use with relational databases 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-017/30
출원번호	US-0365828 (2003-02-13)
발명자 / 주소	Ripley, John R.
출원인 / 주소	Infoglide Software Corporation
대리인 / 주소	Taylor Russell & Russell, P.C.
인용정보	피인용 횟수 : 60 인용 특허 : 10

초록 ▼

The invention provides a system and method for defining a schema and sending a query to a Similarity Search Engine to determine a quantitative assessment of the similarity of attributes between an anchor record and one or more target records. The Similarity Search Engine makes a similarity assessment in a single pass through the target records having multiple relationship characteristics. The Similarity Search Engine is a server configuration that comprises a Gateway for command and response routing, a Virtual Document Manager for document generation, a Search Manager for document scoring, and an Relational Database Management System for providing data persistence, data retrieval and access to User Defined Functions. The Similarity Search Engine uses a unique command syntax based on the Extensible Markup Language to implement functions necessary for similarity searching and scoring.

대표청구항 ▼

1. A method for performing similarity searching, comprising the steps of:receiving a request instruction from a client for initiating a similarity search;generating one or more query commands from the request instruction, each query command designating an anchor document and at least one search document;executing each query command, including:computing a normalized document similarity score having a value of between 0.00 and 1.00 for each search document in each query command for indicating a degree of similarity between the anchor document and each search document;creating a result dataset containing the computed normalized document similarity scores for each search document; andsending a response including the result dataset to the client. 2. The method of claim 1, wherein the step of generating one or more query commands further comprises identifying a schema document for defining structure of search terms, mapping of datasets providing target search values to relational database locations, and designating measures, choices and weight to be used in a similarity search. 3. The method of claim 1, wherein the step of computing a normalized document similarity score comprises:computing attribute token similarity scores having values of between 0.00 and 1.00 for the corresponding leaf nodes of the anchor document and a search document using designated measure algorithms;multiplying each token similarity score by a designated weighting factor;aggregating the token similarity scores using designated choice algorithms for determining a document similarity score having a value of between 0.00 and 1.00 for the search document. 4. The method of claim 3, wherein:the step of computing attribute token similarity scores further comprises computing attribute token similarity scores in a relational database management system;the step of multiplying each token similarity score further comprises multiplying each token similarity score in a similarity search engine; andthe step of aggregating the token similarity scores further comprises aggregating the token similarity scores in the similarity search engine. 5. The method of claim 3, wherein the step of computing attribute token similarity scores having values of between 0.00 and 1.00 further comprises computing attribute token similarity scores having values of between 0.00 and 1.00, whereby a attribute token similarity value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 6. The method of claim 3, further comprising selecting measure algorithms from the group consisting of name equivalents, foreign name equivalents, textual, sound coding, string difference, numeric, numbered difference, ranges, numeric combinations, range combinations, fuzzy, date oriented, date to range, date difference, and date combination. 7. The method of claim 3, further comprising selecting choice algorithms from the group consisting of single best, greedy sum, overall sum, greedy minimum, overall minimum, and overall maximum. 8. The method of claim 1, wherein the step of generating one or more query commands comprises:populating an anchor document with search criteria values;identifying documents to be searched;defining semantics for overriding parameters specified in an associated schema document;defining a structure to be used by the result dataset; andimposing restrictions on the result dataset. 9. The method of claim 8, wherein the step of defining semantics comprises:designating overriding measures for determining attribute token similarity scores;designating overriding choice algorithms for aggregating token similarity scores into document similarity scores; anddesignating overriding weights to be applied to token similarity scores. 10. The method of claim 8, wherein the step of imposing restrictions is selected from the group consisting of defining a range of similarity indicia scores t o be selected and defining percentiles of similarity indicia scores to be selected. 11. The method of claim 1, wherein the step of computing a normalized document similarity score further comprises computing a normalized document similarity score having a value of between 0.00 and 1.00, whereby a normalized similarity indicia value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 12. The method of claim 1, wherein the step of generating one or more query commands further comprises generating one or more query commands whereby each query command includes attributes of command operation, name identification, and associated schema document identification. 13. The method of claim 1, further comprising:receiving a schema instruction from a client;generating a schema command document comprising the steps of:defining a structure of target search terms in one or more search documents;creating a mapping of database record locations to the target search terms;listing semantic elements for defining measures, weights and choices to be used in similarity searches; andstoring the schema command document into a database management system. 14. The method of claim 1, further comprising the step of representing documents and commands as hierarchical XML documents. 15. The method of claim 1, wherein the step of sending a response to the client further comprises sending a response including an error message and a warning message to the client. 16. The method of claim 1, wherein the step of sending a response to the client further comprises sending a response to the client containing the result datasets, whereby each result dataset includes at least one normalized document similarity score, at least one search document name, a path to the search documents having a returned score, and at least one designated schema. 17. The method of claim 1, further comprising:receiving a statistics instruction from a client;generating a statistics command from the statistics instruction, comprising the steps of:identifying a statistics definition to be used for generating statistics;populating an anchor document with search criteria values;identifying documents to be searched;delineating semantics for overriding measures, parsers and choices defined in a semantics clause in an associated schema document;defining a structure to be used by a result dataset;imposing restrictions to be applied to the result dataset;identifying a schema to be used for the basis of generating statistics;designating a name for the target statistics table for storing results;executing the statistics command for generating a statistics schema with statistics table, mappings and measures; andstoring the statistics schema in a database management system. 18. The method of claim 1, further comprising the step of executing a batch command comprising executing a plurality of commands in sequence for collecting results of several related operations. 19. A computer-readable medium containing instructions for controlling a computer system to implement the method of claim 1. 20. A system for performing similarity searching, comprising:a gateway for receiving a request instruction from a client for initiating a similarity search;the gateway for generating one or more query commands from the request instruction, each query command designating an anchor document and at least one search document;a search manager for executing each query command, including:means for computing a normalized document similarity score having a value of between 0.00 and 1.00 for each search document in each query command for indicating a degree of similarity between the anchor document and each search document;means for creating a result dataset containing the computed normalized document similarity scores for each search document; andthe gateway for sending a response including the result dataset to th e client. 21. The system of claim 20, wherein the means for computing a normalized similarity score comprises:a relational database management system for computing attribute token similarity scores having values of between 0.00 and 1.00 for the corresponding leaf nodes of the anchor document and a search document using designated measure algorithms; andthe search manager for multiplying each token similarity score by a designated weighting factor and aggregating the token similarity scores using designated choice algorithms for determining a document similarity score having a value of between 0.00 and 1.00 for the search document. 22. The system of claim 21, wherein the relational database management system includes means for computing an attribute token similarity score having a value of between 0.00 and 1.00, whereby a token similarity indicia value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 23. The system of claim 21, wherein the measure algorithms are selected from the group consisting of name equivalents, foreign name equivalents, textual, sound coding, string difference, numeric, numbered difference, ranges, numeric combinations, range combinations, fuzzy, date oriented, date to range, date difference, and date combination. 24. The system of claim 21, wherein the choice algorithms are selected from the group consisting of single best, greedy sum, overall sum, greedy minimum, overall minimum, and overall maximum. 25. The system of claim 20, wherein:each one or more query commands further comprises a measure designation; andthe database management system further comprises designated measure algorithms for computing a token similarity score. 26. The system of claim 20, wherein each query command comprises:an anchor document populated with search criteria values;at least one search document;designated measure algorithms for determining token similarity scores;designated choice algorithms for aggregating token similarity scores into document similarity scores;designated weights for weighting token similarity scores;restrictions to be applied to a result dataset document; anda structure to be used by the result dataset. 27. The system of claim 20, wherein the computed document similarity scores have a value of between 0.00 and 1.00, whereby a normalized similarity indicia value of 0.00 represents no similarity matching, a value of 1.00 represents exact similarity matching, and values between 0.00 and 1.00 represent degrees of similarity matching. 28. The system of claim 20, wherein each query command includes attributes of command operation, name identification, and associated schema document identification for providing a mapping of search documents to database management system locations. 29. The system of claim 20, further comprising:the gateway for receiving a schema instruction from a client;a virtual document manager for generating a schema command document;the schema command document comprising:a structure of target search terms in one or more search documents;a mapping of database record locations to the target search terms;semantic elements for defining measures, weights, and choices for use in searches; anda relational database management system for storing the schema command document. 30. The system of claim 20, wherein each result dataset includes at least one normalized document similarity score, at least one search document name, a path to the search documents having a returned score and at least one designated schema. 31. The system of claim 20, wherein each result dataset includes an error message and a warning message to the client. 32. The system of claim 20, further comprising:the gateway for receiving a statistics instruction from a client and for generating a statistics command from the statistics instruction;the search manager for identifying a statistics definition to be used f or generating statistics, populating an anchor document with search criteria values, identifying documents to be searched, delineating semantics for overriding measures, weights and choices defined in a semantics clause in an associated schema document, defining a structure to be used by a result dataset, imposing restrictions to be applied to the result dataset, identifying a schema to be used for the basis of generating statistics, designating a name for the target statistics table for storing results; anda statistics processing module for executing the statistics command for generating a statistics schema with statistics table, mappings and measures, and storing the statistics schema in a database management system. 33. The system of claim 20, further comprising the gateway for receiving a batch command from a client for executing a plurality of commands in sequence for collecting results of several related operations.

이 특허에 인용된 특허 (10)

Nishioka, Shingo; Iwayama, Makoto; Ono, Kazuhiro; Takano, Akihiko; Niwa, Yoshiki; Yamaguchi, Atsuko, Document retrieval assisting method and system for the same and document retrieval service using the same.
상세보기
Shingo Nishioka JP; Makoto Iwayama JP; Kazuhiro Ono JP; Akihiko Takano JP; Yoshiki Niwa JP; Atsuko Yamaguchi JP, Document retrieval assisting method and system for the same and document retrieval service using the same.
상세보기
Barber Ronald J. (San Jose CA) Beitel Bradley J. (Woodside CA) Equitz William R. (Palo Alto CA) Niblack Carlton W. (San Jose CA) Petkovic Dragutin (Saratoga CA) Work Thomas R. (San Francisco CA) Yank, Image query system and method.
상세보기
Kubota Rie,JPX, Information search method, information search device, and storage medium for storing an information search program.
상세보기
Snyder David L. ; Calistri-Yeh Randall J., Management and analysis of document information text.
상세보기
Bjornson, Robert D.; Carriero, Nicholas J.; Sherman, Andrew H.; Weston, Stephen B.; Wing, James E., Method and apparatus for high-performance sequence comparison.
상세보기
Jain Ramesh ; Horowitz Bradley ; Fuller Charles E. ; Gupta Amarnath ; Bach Jeffrey R. ; Shu Chiao-fe, Similarity engine for content-based retrieval of images.
상세보기
Wheeler, David B.; Clay, Matthew J., System and method for performing similarity searching.
상세보기
Jain Ramesh ; Horowitz Bradley ; Fuller Charles E. ; Gupta Amarnath ; Bach Jeffrey R. ; Shu Chiao-fe, Threshold-based comparison.
상세보기
Niblack Carlton Wayne ; Petkovic Dragutin, Video query system and method.
상세보기

이 특허를 인용한 특허 (60)

Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Adaptive pattern recognition based controller apparatus and method and human-interface therefore.
상세보기
Ting, Edison Lao; Truong, Tuong Chanh, Apparatus and method for skipping XML index scans with common ancestors of a previously failed predicate.
상세보기
Charlet, Kyle J.; Hembry, Douglas M. F.; Holtz, Christopher M.; Wiedenmann, Carol M., Apparatus, system, and method for defining a metadata schema to facilitate passing data between an extensible markup language document and a hierarchical database.
상세보기
Fontoura, Marcus Felipe; Neumann, Andreas; Rajagopalan, Sridhar; Shekita, Eugene J.; Zien, Jason Yeong, Architecture for an indexer.
상세보기
Andreev,Alexander E.; Bolotov,Anatoli A.; Radovanovic,Nikola, Built-in functional tester for search engines.
상세보기
Petriuc, Mihai, Click distance determination.
상세보기
Zlatanov,Teodore Zlatkov; Furlong,Christopher, Computer systems and methods for platform independent presentation design.
상세보기
Shuster, Gary Stephen, Computer-implemented search using result matching.
상세보기
Shuster, Gary Stephen, Computer-implemented search using result matching.
상세보기
Irle, Klaus; Lu, Liwei; Kindsvogel, Uwe; Janssen, Tatjana, Converting object structures for search engines.
상세보기
Wanker, William Paul, Customizable electronic commerce comparison system and method.
상세보기
Ozzie, Raymond E.; Ozzie, Jack E.; Moromisato, George P.; Suthar, Paresh S.; Narayanan, Raman; Augustine, Matthew S., Data synchronization and sharing relationships.
상세보기
Hirose, Atsuhito; Kawakami, Toshihiro; Yamamoto, Akihiro, Database system and method for searching database.
상세보기
Tankovich, Vladimir; Meyerzon, Dmitriy; Poznanski, Victor, Detection of junk in search result ranking.
상세보기
Wason, James R., Document handling in a web application.
상세보기
Wason, James R., Document handling in a web application.
상세보기
Wason, James R., Document handling in a web application.
상세보기
Tankovich, Vladimir; Meyerzon, Dmitriy; Taylor, Michael James, Document length as a static relevance feature for ranking search results.
상세보기
Chakrabarti, Kaushik; Ganti, Venkatesh; Xin, Dong, Efficient evaluation of object finder queries.
상세보기
Meyerzon, Dmitriy; Shnitko, Yauhen; Burges, Chris J. C.; Taylor, Michael James, Enterprise relevancy ranking using a neural network.
상세보기
Robertson, Stephen; Zaragoza, Hugo; Taylor, Michael; Larimore, Stefan Isbein; Petriuc, Mihai, Field weighting in text searching.
상세보기
Long,Thomas Edwin, Generic product finder system and method.
상세보기
Kurisu, Toshiharu; Tsuge, Yuki; Hashida, Naoki; Masuda, Kyoko, Information-processing device, server device, interaction system, and program.
상세보기
Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Internet appliance system and method.
상세보기
Wanker, William Paul, Machine implemented methods of ranking merchants.
상세보기
Gundersen, Matthew A.; Hixson, Stephen C., Mapping web sites based on significance of contact and category.
상세보기
Ozcan, Fatma; Ting, Edison Lao, Method and apparatus for XML query evaluation using early-outs and multiple passes.
상세보기
Singh,Jaswinder Pal; Wang,Randolph, Method and apparatus for searching network resources.
상세보기
Zhang, David Chen; Chung, Jack Vinh, Method and system for improving performance of counting hits in a search.
상세보기
Charlet,Kyle Jeffrey; Hembry,Douglas Michael Frederick; Holtz,Christopher M.; Wiedenmann,Carol M., Method for defining a metadata schema to facilitate passing data between an extensible markup language document and a hierarchical database.
상세보기
Fontoura, Marcus F.; Neumann, Andreas; Qi, Runping; Shekita, Eugene J., Method, system, and program for handling redirects in a search engine.
상세보기
Balmin, Andrey; Ozcan, Fatma; Tran, Tam Minh Dai, Optimization of extensible markup language path language (XPATH) expressions in a database management system configured to accept extensible markup language (XML) queries.
상세보기
Fontoura, Marcus Felipe; Kraft, Reiner; Leung, Tony Kai-Chi; McPherson, Jr., John A.; Neumann, Andreas; Qi, Runping; Rajagopalan, Sridhar; Shekita, Eugene J.; Zien, Jason Yeong, Pipelined architecture for global analysis and index building.
상세보기
Obata, Kenji; Meyerzon, Dmitriy, Proxy server using a statistical model.
상세보기
Meyerzon, Dmitriy; Zaragoza, Hugo, Ranking search results using biased click distance.
상세보기
Meyerzon, Dmitriy; Li, Hang, Ranking search results using feature extraction.
상세보기
Meyerzon, Dmitriy; Zaragoza, Hugo, Ranking search results using language types.
상세보기
Poznanski, Victor; Wang, Oivind; Holm, Fredrik; Bodd, Nicolai; Tankovich, Vladimir; Meyerzon, Dmitriy, Re-ranking search results.
상세보기
Vincent, III,Winchel Todd, Schema framework and a method and apparatus for normalizing schema.
상세보기
Vincent, III, Winchel Todd, Schema framework and method and apparatus for normalizing schema.
상세보기
Tankovich, Vladimir; Li, Hang; Meyerzon, Dmitriy; Xu, Jun, Search results ranking using editing distance and document information.
상세보기
Andreev, Alexander E.; Bolotov, Anatoli A., Sequential tester for longest prefix search engines.
상세보기
Andreev,Alexander E.; Bolotov,Anatoli A., Sequential tester for longest prefix search engines.
상세보기
Kaiser, Martin; Dehn, Rene; Anzuinelli, Gisella Dominguez; Gross, Rene, Software and method for utilizing a common database layout.
상세보기
Dehn, Rene; Kaiser, Martin, Software and method for utilizing a generic database query.
상세보기
Bolivar, Alvaro, Suggested item category systems and methods.
상세보기
Bolivar, Alvaro, Suggested item category systems and methods.
상세보기
Tindal, Glen D., System and method for configuring a network device.
상세보기
Meyerzon, Dmitriy; Zaragoza, Hugo, System and method for ranking search results using click distance.
상세보기
Merrigan, Chadd Creighton; Peltonen, Kyle G.; Meyerzon, Dmitriy; Lee, David J., System and method for scoping searches using index keys.
상세보기
Chen, Chia-Hsun; Bolognese, Luca; Lombardi, Vincenzo; Gazitt, Omri; Pizzo, Michael J.; Zhengnan Zhu, Jason, System and method providing diffgram format.
상세보기
Chen,Chia Hsun; Bolognese,Luca; Lombardi,Vincenzo; Gazitt,Omri; Pizzo,Michael J.; Zhu,Jason Zhengnan, System and method providing diffgram format.
상세보기
Dehn, Rene; Kaiser, Martin; Anzuinelli, Gisella Dominguez, System and method utilizing a generic update module with recursive calls.
상세보기
Kraft, Reiner; Neumann, Andreas, System and program for handling anchor text.
상세보기
Vincent, III, Winchel Todd, System for creating and editing mark up language forms and documents.
상세보기
Vincent, III, Winchel Todd, System for creating and editing mark up language forms and documents.
상세보기
Vincent, III,Winchel Todd, System for normalizing and archiving schemas.
상세보기
Vincent, III, Winchel Todd, System for viewing and indexing mark up language messages, forms and documents.
상세보기
Vincent, III,Winchel Todd, System for viewing and indexing mark up language messages, forms and documents.
상세보기
Leitner, Stephen; Manthey, Kevin W.; Burgess, Mark; Canfield, Samuel, Systems and methods for intelligent parallel searching.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Similarity search engine for use with relational databases 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (10)

이 특허를 인용한 특허 (60)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Similarity search engine for use with relational databases 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (10)

이 특허를 인용한 특허 (60)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트