[특허]Machine-learned approach to determining document relevance for search over large electronic collections of documents

Machine-learned approach to determining document relevance for search over large electronic collections of documents 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-015/18 G06F-017/00
출원번호	US-0754159 (2004-01-09)
등록번호	US-7287012 (2007-10-23)
발명자 / 주소	Corston,Simon H. Chandrasekar,Raman Chen,Harr
출원인 / 주소	Microsoft Corporation
대리인 / 주소	Amin, Turocy & Calvin, LLP
인용정보	피인용 횟수 : 22 인용 특허 : 42

초록 ▼

The present invention relates to a system and methodology that applies automated learning procedures for determining document relevance and assisting information retrieval activities. A system is provided that facilitates a machine-learned approach to determine document relevance. The system includes a storage component that receives a set of human selected items to be employed as positive test cases of highly relevant documents. A training component trains at least one classifier with the human selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, wherein the other items can be selected by a statistical search, for example. Also, the trained classifier can be employed to aid an individual in identifying and selecting new positive cases or utilized to filter or re-rank results from a statistical-based search.

대표청구항 ▼

What is claimed is: 1. A computer-implemented system that facilitates a machine-learned approach to determine document relevance, comprising: a storage component that receives a set of human or machine selected items to be employed as positive test cases; and a training component that trains at least one classifier with the human or machine selected items as positive test cases and one or more other items as negative test cases in order to provide a query-independent model, the trained classifier is employed to filter documents obtained from statistical-based or probabilistic-based searches. 2. The system of claim 1, the negative test cases selected by a statistical search. 3. The system of claim 1, the trained classifier is employed to aid an individual in selecting new positive cases. 4. The system of claim 1, outputs of the filter are ranked such that positive cases are ranked before negative cases. 5. The system of claim 1, the outputs are ranked according to a probability they are a positive case. 6. The system of claim 1, the storage component includes logs of relevant sites of interest for users, documents, or data items. 7. The system of claim 6, the storage component includes information for a centralized store or from divergent sources such as web sites, document collections, encyclopedias, local data sources and remote data sources. 8. The system of claim 1, the classifier is employed to automatically analyze data in the storage component in order to assist one or more tools that can interact with a user interface. 9. The system of claim 8, the tools include at least one of an administrative tool, an editing tool, and a ranking tool. 10. The system of claim 8, the tools are employed in at least one of an online and an offline manner. 11. The system of claim 1, the classifiers are trained according to positive and negative test data in order to determine an item's relevance such as from documents or links that suggest other sites of useful information. 12. The system of claim 11, further comprising a set of manually selected documents or items to train a machine-learned classifier. 13. The system of claim 11, the classifier is applied to new terms to identify best bet or relevant documents. 14. The system of claim 11, further comprising bootstrapping new models over various training iterations to facilitate a growing model of learned expressions that are employed for more accurate information retrieval activities. 15. The system of claim 14, further comprising best bets that are hand-selected by an editor. 16. The system of claim 15, further comprising a component to maximize a likelihood of displaying types of documents or items that users are likely to think are interesting enough to view or retrieve. 17. The system of claim 1, the classifier includes at least one of the following learning techniques: Support Vector Machines (SVM), a Naive Bayes, a Bayes Net, a decision tree, similarity-based, a vector-based, a Hidden Markov Model, or other learning technique. 18. The system of claim 1, further comprising a component to perform post-processing of information to determine a document or site's relevance to a user or administrator. 19. The system of claim 18, the post-processing includes ranking in accordance with predetermined probability thresholds, items having a higher probability of being relevant are presented before items of lower probability. 20. The system of claim 18, further comprising explicit annotations that are added to displayed items to indicate a document or site's relevance or importance. 21. A computer readable medium having computer readable instructions stored thereon for implementing the training component and the storage component of claim 1. 22. A computer-based information retrieval system, comprising: means for determining a training set for data terms; means for automatically classifying the training set; means for determining new items from the classified training set; and means for presenting the new items in accordance with an information retrieval request. 23. The system of claim 22, further comprising means for testing the classified training set. 24. A computer-implemented method to facilitate automated information retrieval, comprising: processing n queries from a data log, n being an integer; identifying relevant candidates from the n queries; and training classifiers to identify other relevant candidates for subsequent search activities. 25. The method of claim 24, farther comprising forwarding an analysis to an editor that determines whether or not a piece of information is desirable to be presented for a given query or topic. 26. The method of claim 24, farther comprising extracting relevant candidates from a list of potential documents or sites and automatically placing the best bets before other statistical rankings. 27. The method of claim 24, further comprising re-ranking results by a probability that a document is relevant, respective documents are downloaded, and terms are extracted and looked-up for terms appearing in the document. 28. The method of claim 24, farther comprising determining at least one category to be classified. 29. The method of claim 28, further comprising employing a subset of a training data set to test the classified categories. 30. A computer readable medium having a data structure stored thereon, comprising: a first data field related to a training data set for a relevance category; a second data field that relates to a new set of data items pertaining to the relevance category; and a third data field that relates to a probability ranking for the new set of data items.

이 특허에 인용된 특허 (42)

Braden-Harder Lisa ; Corston Simon H. ; Dolan William B. ; Vanderwende Lucy H., Apparatus and methods for an information retrieval system that employs natural language processing of search results to.
상세보기
Lee Shih-Jong J. ; Wilhelm Paul S. ; Bannister Wendy R. ; Kuan Chih-Chau L. ; Oh Seho ; Meyer Michael G., Apparatus for the identification of free-lying cells.
상세보기
Lee Shih-Jong J. ; Wilhelm Paul S. ; Bannister Wendy R. ; Kuan Chih-Chau L. ; Oh Seho ; Meyer Michael G., Apparatus for the identification of free-lying cells.
상세보기
Lee Shih-Jong J. ; Wilhelm Paul S. ; Bannister Wendy R. ; Kuan Chih-Chau L. ; Oh Seho ; Meyer Michael G., Apparatus for the identification of free-lying cells.
상세보기
Neal, Michael Renn; Wilmsen, James Michael; Beall, Christopher Wade, Automated classification of items using cascade searches.
상세보기
Bolle,Rudolf M.; Haas,Norman; Oles,Frank J.; Zhang,Tong, Business method and apparatus for employing induced multimedia classifiers based on unified representation of features reflecting disparate modalities.
상세보기
Vena John, Combined sewer overflow and storm water diverter screen.
상세보기
Summerlin, Thomas A.; Shinkle, Timothy; Stalters, Russell E., Computer readable electronic records automated classification system.
상세보기
Hoffberg Steven M. ; Hoffberg-Borghesani Linda I., Ergonomic man-machine interface incorporating adaptive pattern recognition based control system.
상세보기
Hoffberg Steven M. ; Hoffberg-Borghesani Linda I., Human factored interface incorporating adaptive pattern recognition based controller apparatus.
상세보기
Mazur Richard A. ; Csulits Frank M. ; Mennie Douglas U., Intelligent document handling system.
상세보기
Komissarchik Edward ; Arlazarov Vladimir,RUX ; Bogdanov Dimitri ; Finkelstein Yuri ; Ivanov Andrey ; Kaminsky Jacob,RUX ; Komissarchik Julia ; Krivnova Olga,RUX ; Kronrod Mikhail ; Malkovsky Mikhail,, Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of.
상세보기
Graupe Daniel, Large memory storage and retrieval (LAMSTAR) network.
상세보기
Edlund, Stefan B.; Emens, Michael L.; Kraft, Reiner; Myllymaki, Jussi; Teng, Shanghua, Metadata search results ranking system.
상세보기
Amado Carlos Armando (444 Brickell Avenue #51-111 Miami FL 33131-2400), Method and apparatus for applying if-then-else rules to data sets in a relational data base and generating from the resu.
상세보기
Bolle, Rudolf M.; Haas, Norman; Oles, Frank J.; Zhang, Tong, Method and apparatus for inducing classifiers for multimedia based on unified representation of features reflecting disparate modalities.
상세보기
Smith Mary V. ; Frost Mark D., Method and system for diagnosing and reporting failure of a vehicle emission test.
상세보기
Smith Mary V. ; Frost Mark D., Method and system for diagnosing and reporting failure of a vehicle emission test.
상세보기
Emico James H. ; Murdock Michael C., Method and system for lexical processing.
상세보기
Forman,George H.; Fawcett,Tom E.; Suermondt,Henri J., Method and system for measuring the quality of a hierarchy.
상세보기
Chandrasekar, Raman; Steinkraus, David W., Method and system for performing phrase/word clustering and cluster merging.
상세보기
Errico James H. ; Murdock Michael C. ; Wang Shay-Ping T., Method and system for velocity-based handwriting recognition.
상세보기
Errico James H. ; Labun Nicholas M. ; Loda John J. ; Murdock Michael C. ; Wang Shay-Ping T., Method and system using meta-classes and polynomial discriminant functions for handwriting recognition.
상세보기
Hong,Se June; Hosking,Jonathan R.; Natarajan,Ramesh, Method for ensemble predictive modeling by multiplicative adjustment of class probability: APM (adjusted probability model).
상세보기
Barry G. Becker ; Ron Kohavi ; Daniel A. Sommerfield ; Joel D. Tesler, Method system and computer program product for visualizing an evidence classifier.
상세보기
Barry Glenn Becker ; Roger A. Crawfis, Method, system and computer program product for visually approximating scattered data using color to represent values of a categorical variable.
상세보기
Tesler Joel D., Method, system, and computer program product for mapping between an overview and a partial hierarchy.
상세보기
Becker Barry G., Method, system, and computer program product for visualizing a data structure.
상세보기
Kohavi Ron ; Tesler Joel D., Method, system, and computer program product for visualizing a decision-tree classifier.
상세보기
Becker Barry G. ; Kohavi Ron ; Sommerfield Daniel A. ; Tesler Joel D., Method, system, and computer program product for visualizing an evidence classifier.
상세보기
Tesler Joel D., Method, system, and computer program product for visualizing data using partial hierarchies.
상세보기
Bokser Mindy ; Pon Leonard ; Yang Jun ; Choy Kenneth, Pattern recognition employing arbitrary segmentation and compound probabilistic evaluation.
상세보기
Ito Satoshi (Kanagawa JPX) Ohata Toyoharu (Kanagawa JPX) Ishibashi Akira (Kanagawa JPX) Nakayama Norikazu (Kanagawa JPX), Semiconductor laser.
상세보기
Kadtke James B. ; Kremliovsky Michael N., Signal and pattern detection or classification by estimation of continuous dynamical models.
상세보기
Kadar Ivan ; Schellhammer Scott A., System and method for functional recognition of emitters.
상세보기
Chandrasekar, Raman; Finger, II, James Charles; Salas, Sally K.; Watson, Eric Benjamin, System and method for performing a search and a browse on a query.
상세보기
Chandrasekar,Raman; Finger, II,James C.; Watson,Eric B., System and method for query refinement to enable improved searching based on identifying and utilizing popular concepts related to users' queries.
상세보기
Corston, Simon H.; Dolan, William B.; Vanderwende, Lucy H.; Braden-Harder, Lisa, System for processing textual inputs using natural language processing techniques.
상세보기
Reis James J. (La Palma CA) Luk Anthony L. (Rancho Palos Verdes CA) Lucero Antonio B. (Anaheim CA) Garber David D. (Cypress CA), Target acquisition and tracking system.
상세보기
Horvitz Eric ; Heckerman David E. ; Dumais Susan T. ; Sahami Mehran ; Platt John C., Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set.
상세보기
Zhilyaev Maxim, Test classification system and method.
상세보기
Catlett Jason A. (Green Brook NJ) Gale William Arthur (Maplewood NJ) Lewis David Dolan (Summit NJ), Training apparatus and method.
상세보기

이 특허를 인용한 특허 (22)

Yih, Wen-tau; Meek, Christopher A., Consistent phrase relevance measures.
상세보기
Jiang, Daxin; Li, Hang, Context-aware query suggestion by mining log data.
상세보기
Chandrasekar, Raman; Slawson, Dean A., Creating business value by embedding domain tuned search on web-sites.
상세보기
Chandrasekar, Ramen; Slawson, Dean A., Creating business value by embedding domain tuned search on web-sites.
상세보기
Svore, Krysta M.; Abib, Elbio Renato Torres; Burges, Christopher J. C.; Middha, Bhuvan, Identification of sample data items for re-judging.
상세보기
Jing, Feng; Zhang, Lei; Ma, Wei-Ying, Identifying sight for a location.
상세보기
Liu, Chao; Wang, Yi-Min, Learning a ranker to rank entities with automatically derived domain-specific preferences.
상세보기
Jin, Huaxing; Zheng, Wei; Huang, Peng; Yang, Xu; Lin, Feng; Feng, Jiong; Zhang, Qin, Method and apparatus of ordering search results.
상세보기
Jin, Huaxing; Zheng, Wei; Huang, Peng; Yang, Xu; Lin, Feng; Feng, Jiong; Zhang, Qin, Method and apparatus of ordering search results.
상세보기
Jin, Huaxing; Zheng, Wei; Huang, Peng; Yang, Xu; Lin, Feng; Feng, Jiong; Zhang, Qin, Method and apparatus of ordering search results.
상세보기
Knepper, Margaret M.; Fox, Kevin Lee; Frieder, Ophir, Method for re-ranking documents retrieved from a document database.
상세보기
Lai, Tzu-Chien (Reggie); Low, Biam Chee, Optimization filters for user generated content searches.
상세보기
Zeng, Hua Jun; He, Qicai; Liu, Guimei; Chen, Zheng; Zhang, Benyu; Ma, Wei Ying, Query-based snippet clustering for search result grouping.
상세보기
Jing, Feng; Zhang, Lei; Ma, Wei-Ying, Ranking content based on relevance and quality.
상세보기
Rao, Arjun Kumar; Kumar, Karthik; Dhakshinamoorthy, Nagadhilipan, System and method for generating a report in real-time from a resource management system.
상세보기
Kemp, Richard Douglas; Grenet, Philippe, System and method for topical document searching.
상세보기
Kemp, Richard Douglas; Grenet, Philippe, System and method for topical document searching.
상세보기
Shulman, Stuart William; Hoy, Mark James, System and method of classifier ranking for incorporation into enhanced machine learning.
상세보기
Ravid, Yiftach, System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith.
상세보기
Jing, Feng; Zhang, Lei; Li, Ming Jing; Ma, Wei-Ying; Deng, Kefeng, User interface for viewing clusters of images.
상세보기
Jing, Feng; Zhang, Lei; Li, Ming Jing; Ma, Wei-Ying; Deng, Kefeng, User interface for viewing clusters of images.
상세보기
Jing, Feng; Zhang, Lei; Li, Ming Jing; Ma, Wei-Ying; Deng, Kefeng, User interface for viewing clusters of images.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Machine-learned approach to determining document relevance for search over large electronic collections of documents 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (42)

이 특허를 인용한 특허 (22)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Machine-learned approach to determining document relevance for search over large electronic collections of documents 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (42)

이 특허를 인용한 특허 (22)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트