IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0179476
(2002-06-24)
|
발명자
/ 주소 |
- Jiang,Dongming
- Krishnamurthy,Arvind
- Singh,Jaswinder Pal
- Wang,Randolph
|
출원인 / 주소 |
|
대리인 / 주소 |
Wilson Sonsini Goodrich &
|
인용정보 |
피인용 횟수 :
257 인용 특허 :
20 |
초록
The present invention pertains to the field of computer software. More specifically, the present invention relates to dynamic discovery of documents or information through a focused crawler or search engine.
대표청구항
▼
The invention claimed is: 1. A method of focused crawling, comprising: accessing a query input; crawling a plurality of documents continually, the documents including links to each other, and the crawling at least partly guided by a crawl metric, wherein the crawl metric quantifies priority for cra
The invention claimed is: 1. A method of focused crawling, comprising: accessing a query input; crawling a plurality of documents continually, the documents including links to each other, and the crawling at least partly guided by a crawl metric, wherein the crawl metric quantifies priority for crawling links emanating from a certain document within the crawling, the crawl metric at least partly determined by a first mechanism, the first mechanism including a first combination, the first combination including a first plurality of one or more procedures, the first plurality of one or more procedures including evaluating relevance of documents using a link structure of the crawled documents, wherein the evaluating relevance of documents using a link structure of the crawled documents is performed repeatedly and continually, and wherein the evaluating relevance of documents using a link structure of the crawled documents includes: accessing a first plurality of documents from a database of a plurality of received documents, the plurality of received documents including crawled documents, the first plurality of documents to be ranked, generating a graph of the first plurality of documents, assigning weights to a plurality of nodes of the graph, wherein nodes of the graph represent the documents and edges represent links between the documents, finding an assignment of weights to one or more nodes of the graph, by propagating weights through the graph, the assignment of weight to a node based at least in part on calculating a weighted sum of weights propagated from neighboring nodes, and generating a ranked list of at least the first plurality of documents, the ranked list at least partly generated from the graph; and returning target documents, the target documents being relevant to the query input, the target documents found from the plurality of crawled documents, the target documents returned at least partly based on a search metric, the search metric quantifying relevance or importance of a document to the query input, the search metric at least partly determined by a second mechanism, the second mechanism including a second combination, the second combination being different from the first combination, the second combination including a second plurality of one or more procedures, the second plurality of procedures including evaluating relevance of documents using a template, the template including a plurality of one or more template portions, at least one of the template portions including a second plurality of one or more hierarchical levels. 2. The method of claim 1, wherein relevance includes importance. 3. The method of claim 1, wherein at least one of the first mechanism and the second mechanism includes: associating a weight to each of the evaluated relevances of the procedures; and combining the evaluated relevances and the weights of the evaluated relevances. 4. The method of claim 1, wherein one or more of: 1) the first plurality of one or more hierarchical levels and 2) the second plurality of one or more hierarchical levels, includes at least one or more heading levels and one or more content levels. 5. The method of claim 1, wherein evaluating relevance includes evaluating relevance of at least a first document and one or more of a first plurality of one or more referring documents and a second plurality of one or more referring documents, each of the first plurality of one or more referring documents referring to the first document directly, and each of the second plurality of referring documents referring to the first document indirectly through one or more documents. 6. The method of claim 1, wherein the procedure, of the first plurality of one or more procedures, of evaluating relevance of documents using a link structure of the crawled documents, further comprises: expanding the graph with a second plurality of one or more documents from the database, wherein a third plurality includes a union of the first plurality of documents and the second plurality of documents, and the third plurality of documents is smaller than the plurality of received documents. 7. The method of claim 1, wherein the procedure, of the first plurality of one or more procedures, of evaluating relevance of documents using a link structure of the crawled documents, further comprises: expanding the graph with a second plurality of one or more documents from the database, such that a third plurality includes a union of the first plurality of documents and the second plurality of documents, and the third plurality of documents is smaller than the plurality of received documents, the second plurality including one or more of: 1) one or more documents connected within a first specified number of links in a forward direction from one or more documents of the first plurality of documents, the forward direction being forward from the first plurality of documents, and 2) one or more documents connected within a second specified number of links in a backward direction from one or more documents of the first plurality of documents, the backward direction being backward from the first plurality of documents. 8. The method of claim 1, wherein the procedure, of the first plurality of one or more procedures, of evaluating relevance of documents using a link structure of the crawled documents, further comprises: expanding the graph with a second plurality of one or more documents from the database, such that a third plurality includes a union of the first plurality of documents and the second plurality of documents, and the third plurality of documents is smaller than the plurality of received documents, the second plurality including one or more of: 1) all documents connected within a first specified number of links in a forward direction from one or more documents of the first plurality of documents, the forward direction being forward from the first plurality of documents, and 2) all documents connected within a second specified number of links in a backward direction from one or more documents of the first plurality of documents, the backward direction being backward from the first plurality of documents. 9. The method of claim 1, wherein the first plurality of documents includes recently received documents of the plurality of received documents. 10. The method of claim 1, wherein the procedure, of the first plurality of one or more procedures, of evaluating relevance of documents using a link structure of the crawled documents, further comprises: shrinking the graph by removing one or more nodes of the graph. 11. The method of claim 1, wherein the procedure, of the first plurality of one or more procedures, of evaluating relevance of documents using a link structure of the crawled documents, further comprises: shrinking the graph by combing one or more sets of one or more nodes of the graph. 12. The method of claim 11, wherein the combining is based on common characteristics of the nodes or relationships between the nodes. 13. The method of claim 1, wherein the propagating weights through the graph occurs up to a limited node distance. 14. The method of claim 1, wherein weights assigned to a document include at least one of relevance of the document to the query input and importance of the document independent of the query input. 15. The method of claim 1, wherein the second plurality of procedures further includes one or more of: 1) evaluating relevance of documents using logical expressions of keywords and phrases, 2) evaluating relevance of documents using a link structure of the crawled documents, and 3) evaluating relevance based on freshness of documents. 16. A method, comprising: performing a plurality of focused crawls, wherein each of the plurality of focused crawls comprises: accessing a query input; crawling a plurality of documents, the documents including links to each other, and the crawling at least partly guided by a crawl metric, the crawl metric at least partly determined by a first mechanism, the first mechanism including a first combination, the first combination including evaluating relevance of documents using a link structure of the crawled documents wherein the evaluating relevance of documents using a link structure of the crawled documents is performed repeatedly and continually, and wherein the evaluating relevance of documents using a link structure of the crawled documents includes: accessing a first plurality of documents from a database of a plurality of received documents, the plurality of received documents including crawled documents, the first plurality of documents to be ranked, generating a graph of the first plurality of documents, assigning weights to a plurality of nodes of the graph wherein nodes of the graph represent the documents and edges represent links between the documents, finding an assignment of weights to one or more nodes of the graph, by propagating weights through the graph, the assignment of weight to a node based at least in part on calculating a weighted sum of weights propagated from neighboring nodes, and generating a ranked list of at least the first plurality of documents, the ranked list at least partly generated from the graph; and returning target documents, the target documents being relevant to the query input, the target documents found from the plurality of crawled documents, the target documents returned at least partly based on a search metric, the search metric quantifying relevance or importance of a document to the query input, the search metric at least partly determined by a second mechanism, the second mechanism including a second combination, the second combination being different from the first combination, the second combination including one or more of 1) evaluating relevance of documents using logical expressions of keywords and phrases, 2) evaluating relevance of documents using a template including a plurality of one or more template portions, at least one of the template portions including a plurality of one or more hierarchical levels, 3) evaluating relevance of documents using a link structure of the crawled documents, and 4) evaluating relevance based on freshness of documents, wherein the method is performed on at least one of 1)a first processor and 2) one or more of a first plurality of one or more processors. 17. The method of claim 16, wherein relevance includes importance. 18. The method of claim 16, wherein evaluating relevance of documents includes evaluating relevance of at least a first document and a second document, the second document referring to the first document. 19. The method of claim 16, wherein evaluating relevance includes evaluating relevance of at least a first document and one or more of a first plurality of one or more referring documents and a second plurality of one or more referring documents, each of the first plurality of one or more referring documents referring to the first document directly, and each of the second plurality of referring documents referring to the first document indirectly through one or more documents.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.