Method for ranking hypertext search results by analysis of hyperlinks from expert documents and keyword scope
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/30
출원번호
US-0418418
(1999-10-15)
등록번호
US-7346604
(2008-03-18)
발명자
/ 주소
Bharat,Krishna A.
Mihaila,George A.
출원인 / 주소
Hewlett Packard Development Company, L.P.
인용정보
피인용 횟수 :
61인용 특허 :
10
초록▼
A computer-implemented method and system for determining search results for a search query for hypertext documents. The hypertext documents are reviewed to determine expert documents. When a query is received, the expert documents are ranked in accordance with the query. Then the target documents o
A computer-implemented method and system for determining search results for a search query for hypertext documents. The hypertext documents are reviewed to determine expert documents. When a query is received, the expert documents are ranked in accordance with the query. Then the target documents of the ranked expert documents are ranked to determine the search result set.
대표청구항▼
What is claimed is: 1. A computer-implemented method for searching a large number of hypertext documents in accordance with a search query, comprising: forming a set of expert documents from the set of all hypertext documents crawled without reference to the search query; ranking the expert documen
What is claimed is: 1. A computer-implemented method for searching a large number of hypertext documents in accordance with a search query, comprising: forming a set of expert documents from the set of all hypertext documents crawled without reference to the search query; ranking the expert documents in accordance with the search query; ranking target documents pointed to by the ranked expert documents; and returning a results list based on the ranked target documents. 2. The computer-implemented method of claim 1, wherein the hypertext documents are pages in the world wide web. 3. The computer-implemented method of claim 1, wherein the hypertext documents are sites in the world wide web. 4. The computer-implemented method of claim 1, wherein the hypertext documents are documents in a hypertext database. 5. The computer-implemented method of claim 1, wherein an expert reverse index is constructed in memory for keywords appearing in the expert documents, the expert reverse index identifying the location of the keywords in the expert documents. 6. The computer-implemented method of claim 5, wherein a keyword of an expert document is included in the expert reverse index if the keyword is part of a key phrase that qualifies at least one URL in the expert document. 7. The computer-implemented method of claim 6, wherein a key phrase qualifies a URL if the URL is within the scope of the key phrase in the expert document. 8. The computer-implemented method of claim 6, wherein a key phrase in an HTML title qualifies all URLs in the entire document. 9. The computer-implemented method of claim 6, wherein a key phrase in an HTML heading qualifies all URLs in that portion of the document before a next HTML heading in the document of greater or equal importance. 10. The computer-implemented method of claim 6, wherein a key phrase in an HTML anchor qualifies the URLs in the anchor. 11. The computer-implemented method of claim 1, wherein forming a set of expert documents includes: determining a document having at least a predetermined number of outlinks to be an expert document if the document also points to at least the predetermined number of targets on distinct non-affiliated hosts. 12. The computer-implemented method of claim 11, wherein expert documents additionally must point to documents that share the same broad classification. 13. The computer-implemented method of claim 1, wherein ranking target documents pointed to by the expert documents includes: determining a plurality of edge scores for each target document, where an edge score is determined for edges between the expert documents and the target document; determining a target score in accordance with the edge scores of the target document; ranking the target documents in accordance with the target scores. 14. The computer-implemented method of claim 13, further including: determining an edge score only for those links to the target document from a predetermined number of top-ranked expert documents. 15. The computer-implemented method of claim 13, further including selecting target documents to be ranked that are linked to by at least two mutually non-affiliated selected expert documents, where the selected target also is not affiliated with the expert documents. 16. The computer-implemented method of claim 13, where an edge score between an expert document and a target document ES(E,T) is determined as follows, where ExpertScore reflects the rankings of the expert documents: a) find #occurrences of each keyword in all keyphrases of expert document E b) if the #occurrences for any keyword in E is 0: ES(E,T)=0 else ES(E,T)=ExpertScore(E)*sum of #occurrences for all keywords. 17. The computer-implemented method of claim 13, wherein, if two affiliated experts have edges to the same target, the edge having a lower edge score is discarded and is not used to determine the target score. 18. The computer-implemented method of claim 17, wherein two hypertext documents are affiliated if at least one of the following is true: 1) they share the same rightmost non-generic suffix and 2) they have an IP address in common. 19. The computer-implemented method of claim 1, wherein ranking the expert documents in accordance with the search query comprises: determining a level score for each of the expert documents; determining a fullness factor for each key phrase on each of the expert documents; and determining an expert score for each expert document in accordance with the level score of the expert document and the fullness factors for the key phrases of the expert document. 20. The computer-implemented method of claim 1, forming a set of expert documents occurs before a search query is received. 21. An apparatus that searches a large number of hypertext documents in accordance with a search query, comprising: a processor and a memory, the processor executing instructions stored in the memory, the instruction comprising: a software portion configured to form a set of expert documents from the set of all documents crawled without reference to the search query; a software portion configured to rank the expert documents in accordance with the search query; a software portion configured to rank target documents pointed to by the ranked expert documents; and a software portion configured to return a results list based on the ranked target documents. 22. A computer program product, comprising: a computer readable medium having computer readable instructions stored therein to search a large number of hypertext documents in accordance with a search query, including: computer readable program code devices for causing a computer to form a set of expert documents from the set of all documents crawled without reference to the search query; computer readable program code devices for causing a computer to rank the expert documents in accordance with the search query; computer readable program code devices for causing a computer to rank target documents pointed to by the ranked expert documents; and computer readable program code devices for causing a computer to return a results list based on the ranked target documents.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (10)
Marchisio, Giovanni B., Internet navigation using soft hyperlinks.
Schuetze, Hinrich; Pitkow, James E.; Pirolli, Peter L.; Chi, Ed H.; Li, Jun, System and method for providing recommendations based on multi-modal user clusters.
Eytan Adar ; Thomas M. Breuel ; Todd A. Cass ; James E. Pitkow ; Hinrich Schuetze, System and method for searching and recommending documents in a collection using share bookmarks.
Schuetze, Hinrich; Pirolli, Peter L.; Pitkow, James E.; Chi, Ed H.; Li, Jun, System and method for visually representing the contents of a multiple data object cluster.
Wong, Sandy; Huynh, Yet L.; Natarajan, Ramakrishnan; Kim, Joon Young; Thogersen, Michael D.; Yao, Tong, Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling.
Laugier, Alexandre; Raymond, Stephanie, Method of ranking a set of electronic documents of the type possibly containing hypertext links to other electronic documents.
Freishtat, Gregg; Hufford, Steve; McFall, Dodge; Wilson, Jackson; Hyman, Tanya; Rijsinghani, Vikas; Kaib, Paul, Systems and methods to facilitate selling of products and services.
Freishtat, Gregg; Hufford, Steve; Mcfall, Dodge; Wilson, Jackson; Hyman, Tanya; Rijsinghani, Vikas; Kaib, Paul, Systems and methods to facilitate selling of products and services.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.