$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

Adaptive web crawling using a statistical model 원문보기

IPC분류정보
국가/구분 United States(US) Patent 등록
국제특허분류(IPC7판)
  • G06F-007/00
  • G06F-015/16
  • G06F-017/00
출원번호 US-0022054 (2004-12-22)
등록번호 US-7328401 (2008-02-05)
발명자 / 주소
  • Obata,Kenji C
  • Meyerzon,Dmitriy
출원인 / 주소
  • Microsoft Corporation
대리인 / 주소
    Christensen O'Connor Johnson Kindness PLLC
인용정보 피인용 횟수 : 69  인용 특허 : 18

초록

A computer based system and method of retrieving information pertaining to documents on a computer network is disclosed. The method includes selecting a set of documents to be accessed during a Web crawl by utilizing a statistical model to determine which previously retrieved documents are most like

대표청구항

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows: 1. A computer-implemented method for selectively accessing a document during a current crawl of a server computer, the document being identified by a document address specification, the d

이 특허에 인용된 특허 (18)

  1. Peterson, Leonard J.; Freedman, Steven J.; Partovi, Hadi; Endres, Raymond E.; D'Souza, David J.; Ellerman, Erik Castedo; Jiggins, Julian P., Client-side system for scheduling delivery of web content and locally managing the web content.
  2. Eichstaedt Matthias ; Ford Daniel Alexander ; Lehman Tobin Jon ; Lu Qi ; Teng Shang-Hua, Collaborative team crawling:Large scale information gathering over the internet.
  3. Narendran Balakrishnan ; Rangarajan Sampath ; Yajnik Shalini, Data distribution techniques for load-balanced fault-tolerant web access.
  4. Houser Peter B. (Poway CA) Adler James M. (Ocean Beach CA), Electronic document verification system and method.
  5. Douglass R. Judd ; Paul Gauthier ; J. Eric Baldeschwieler, Method and apparatus for retrieving documents based on information other than document content.
  6. Katariya, Sanjeev; Jones, William P., Method and system for calculating phrase-document importance.
  7. Meyerzon, Dmitriy; Shoroff, Srikanth; Terek, F. Soner; Norin, Scott, Method and system for detecting duplicate documents in web crawls.
  8. Sanu Sankrant ; Meyerzon Dmitriy, Method of web crawling utilizing address mapping.
  9. Meyerzon, Dmitriy; Sanu, Sankrant, Method of web crawling utilizing crawl numbers.
  10. Pirolli Peter L. ; Pitkow James E., Prefetching and caching documents according to probability ranked need S list.
  11. Marc Alexander Najork ; Clark Allan Heydon, System and method for associating an extensible set of data with documents downloaded by a web crawler.
  12. Soumen Chakrabarti ; Byron Edward Dom ; Martin Henk van den Berg, System and method for focussed web crawling.
  13. Douglas M. Dillon, System and method for multicasting multimedia content.
  14. Sundaresan, Neelakantan; Yi, Jeonghee, System and method for the automatic mining of new relationships.
  15. Monier Louis M., System for adding a new entry to a web page table upon receiving a web page including a link to another web page not having a corresponding entry in the web page table.
  16. Liddy Elizabeth D. ; Yu Edmund Szu-Li, System for retrieving multimedia information from the internet using multiple evolving intelligent agents.
  17. Najork Marc Alexander ; Heydon Clark Allan ; Wiener Janet Lynn, Web crawler system using plurality of parallel priority level queues having distinct associated download priority levels for prioritizing document downloading and maintaining document freshness.
  18. Wiener, Janet L.; Stata, Raymond P.; Burrows, Michael, Web page connectivity server.

이 특허를 인용한 특허 (69)

  1. Sun, Walter; Li, Yipeng; Zhang, Xiao; Ahmed, Junaid, Adaptive crawl rates based on publication frequency.
  2. Kumar, Mani; Kothari, Pankaj; Sahni, Saurabh, Adaptive weighted crawling of user activity feeds.
  3. Milner, Marius C., Automatic proxy setting modification.
  4. Milner, Marius C., Automatic proxy setting modification.
  5. Petriuc, Mihai, Click distance determination.
  6. Patterson, Anna Lynn, Detecting spam documents in a phrase based information retrieval system.
  7. Tankovich, Vladimir; Meyerzon, Dmitriy; Poznanski, Victor, Detection of junk in search result ranking.
  8. Kumar, Mani; Kothari, Pankaj; Sahni, Saurabh, Determining related keywords based on lifestream feeds.
  9. Kumar, Mani; Kothari, Pankaj; Sahni, Saurabh, Determining related keywords based on lifestream feeds.
  10. Tankovich, Vladimir; Meyerzon, Dmitriy; Taylor, Michael James, Document length as a static relevance feature for ranking search results.
  11. Meyerzon, Dmitriy; Shnitko, Yauhen; Burges, Chris J. C.; Taylor, Michael James, Enterprise relevancy ranking using a neural network.
  12. Liu, Jie; Nath, Suman; Lin, Xiaozhu, Executing a fast crawl over a computer-executable application.
  13. Robertson, Stephen; Zaragoza, Hugo; Taylor, Michael; Larimore, Stefan Isbein; Petriuc, Mihai, Field weighting in text searching.
  14. Kenig, Batya; Radchenko, Constantin; Shapiro, Eitan, Incremental crawling of multiple content providers using aggregation.
  15. Kenig, Batya; Radchenko, Constantin; Shapiro, Eitan, Incremental crawling of multiple content providers using aggregation.
  16. Cao, Pei; Eiron, Nadav; Mazumdar, Soham; Patterson, Anna L.; Power, Russell; Zunger, Yonatan, Index server architecture using tiered and sharded phrase posting lists.
  17. Cao, Pei; Eiron, Nadav; Mazumdar, Soham; Patterson, Anna L.; Power, Russell; Zunger, Yonatan, Index server architecture using tiered and sharded phrase posting lists.
  18. Cao, Pei; Eiron, Nadav; Mazumdar, Soham; Patterson, Anna L.; Power, Russell; Zunger, Yonatan, Index server architecture using tiered and sharded phrase posting lists.
  19. Cao, Pei; Eiron, Nadav; Mazumdar, Soham; Patterson, Anna L.; Power, Russell; Zunger, Yonatan, Index server architecture using tiered and sharded phrase posting lists.
  20. Cao, Pei; Eiron, Nadav; Mazumdar, Soham; Patterson, Anna L.; Power, Russell; Zunger, Yonatan, Index server architecture using tiered and sharded phrase posting lists.
  21. Cao, Pei; Eiron, Nadav; Mazumdar, Soham; Patterson, Anna; Power, Russell; Zunger, Yonatan, Index server architecture using tiered and sharded phrase posting lists.
  22. Fontoura, Marcus; Meredith, Daniel N.; Rohde, Douglas Lee Taylor; Palekar, Mahesh S.; Shankar, Asim; Baylor, Denis Murray; Rasscevskis, Zigmars; Csomai, Andras, Indexing system.
  23. Fontoura, Marcus; Meredith, Daniel N.; Rohde, Douglas Lee Taylor; Palekar, Mahesh S.; Shankar, Asim; Baylor, Denis Murray; Rasscevskis, Zigmars; Csomai, Andras, Indexing system.
  24. Patterson, Anna L, Information retrieval system for archiving multiple document versions.
  25. Patterson, Anna L, Information retrieval system for archiving multiple document versions.
  26. Patterson, Anna L., Information retrieval system for archiving multiple document versions.
  27. Patterson, Anna Lynn, Information retrieval system for archiving multiple document versions.
  28. Patterson, Anna L., Integrated external related phrase information into a phrase-based indexing information retrieval system.
  29. Patterson, Anna Lynn, Integrating external related phrase information into a phrase-based indexing information retrieval system.
  30. Alpert, Jesse L.; Tammana, Praveen K.; Kurzion, Yair, Managing URLs.
  31. Alpert, Jesse L.; Tammana, Praveen K.; Kurzion, Yair, Managing URLs.
  32. Alpert, Jesse L., Managing items in crawl schedule.
  33. Dengler, Patrick M.; Krishnan, Arvind K.; Singh, Jagdish; Sanchez, Lawrence M.; Shankar, Sai; Chittamuru, Satish Kumar; Pekic, Zoltan; Mondal, Nabarun; Kumar, Namendra; i Dalfó, Ricard Roma, Metadata driven user interface.
  34. Villadsen, Peter; Chen, Zhaoqi; Gottumukkala, Ramakanthachary S.; Calderon, Marcos, Metadata-based eventing supporting operations on data.
  35. Boyan, Justin; McDonald, Glenn; Benthall, Margaret; Molnar, Ray, Methods and systems to train models to extract and integrate information from data sources.
  36. Morris, Robert P., Methods, systems, and computer program products for characterizing links to resources not activated.
  37. Patterson, Anna L., Multiple index based information retrieval system.
  38. Patterson, Anna L., Multiple index based information retrieval system.
  39. Patterson, Anna Lynn, Multiple index based information retrieval system.
  40. Patterson, Anna L., Phase-based personalization of searches in an information retrieval system.
  41. Mazumdar, Soham; Przebinda, Viktor; Zunger, Yonatan, Phrase extraction using subphrase scoring.
  42. Mazumdar, Soham; Przebinda, Viktor; Zunger, Yonatan, Phrase extraction using subphrase scoring.
  43. Patterson, Anna L., Phrase-based detection of duplicate documents in an information retrieval system.
  44. Patterson, Anna L., Phrase-based detection of duplicate documents in an information retrieval system.
  45. Patterson, Anna Lynn, Phrase-based detection of duplicate documents in an information retrieval system.
  46. Patterson, Anna L., Phrase-based searching in an information retrieval system.
  47. Patterson, Anna L., Phrase-based searching in an information retrieval system.
  48. Kirshenbaum, Evan R.; Suermondt, Henri J.; Lillibridge, Mark David; Yuasa, Kei; Eshghi, Kave; Forman, George, Policy applicability determination.
  49. Fredricksen, Eric Russell; Feng, Hanping; Kataru, Naga Sridhar; Harik, Georges, Prioritized preloading of documents to client.
  50. Obata, Kenji; Meyerzon, Dmitriy, Proxy server using a statistical model.
  51. Cao, Pei; Mazumdar, Soham, Query phrasification.
  52. Cao, Pei; Mazumdar, Sohem, Query phrasification.
  53. Meyerzon, Dmitriy; Zaragoza, Hugo, Ranking search results using biased click distance.
  54. Meyerzon, Dmitriy; Li, Hang, Ranking search results using feature extraction.
  55. Meyerzon, Dmitriy; Zaragoza, Hugo, Ranking search results using language types.
  56. Poznanski, Victor; Wang, Oivind; Holm, Fredrik; Bodd, Nicolai; Tankovich, Vladimir; Meyerzon, Dmitriy, Re-ranking search results.
  57. Blum, Stephen; Greene, Todd, Real-time distribution of messages via a network with multi-region replication in a hosted service environment.
  58. Fredricksen, Eric Russell; Feng, Hanping; Kataru, Naga Sridhar; Harik, Georges, Refreshing cached documents and storing differential document content.
  59. Auerbach, David B.; Alpert, Jesse L., Scheduling a recrawl.
  60. Tankovich, Vladimir; Li, Hang; Meyerzon, Dmitriy; Xu, Jun, Search results ranking using editing distance and document information.
  61. Meyerzon, Dmitriy; Zaragoza, Hugo, System and method for ranking search results using click distance.
  62. Merrigan, Chadd Creighton; Peltonen, Kyle G.; Meyerzon, Dmitriy; Lee, David J., System and method for scoping searches using index keys.
  63. Fredricksen, Eric Russell; Schneider, Fritz John; Dean, Jeffrey Adgate; Ghemawat, Sanjay; Provos, Niels; Harik, Georges, System and method of accessing a document efficiently through multi-tier web caching.
  64. Fredricksen, Eric Russell; Schneider, Fritz John; Dean, Jeffrey Adgate; Ghemawat, Sanjay; Provos, Niels; Harik, Georges, System and method of accessing a document efficiently through multi-tier web caching.
  65. Fredrickson, Eric Russell; Feng, Hanping; Kataru, Naga Sridhar; Harik, Georges, System and method of accessing a document efficiently through multi-tier web caching.
  66. Bar Yossef, Ziv; Kanungo, Tapas; Krauthgamer, Robert, System, method, and service for using a focused random walk to produce samples on a topic from a collection of hyper-linked pages.
  67. Eriksen, Bjorn Marius Aamodt; Laraki, Othman, Systems and methods for cache optimization.
  68. Eriksen, Bjorn Marius Aamodt; Rennie, Jeffrey Glenn; Laraki, Othman, Systems and methods for client authentication.
  69. Eriksen, Bjorn Marius Aamodt; Rennie, Jeffrey Glen; Laraki, Othman, Systems and methods for client cache awareness.
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로