A computer based system and method of determining whether to re-fetch a previously retrieved document across a computer network is disclosed. The method utilizes a statistical model to determine whether the previously retrieved document likely changed since last accessed. The statistical model is co
A computer based system and method of determining whether to re-fetch a previously retrieved document across a computer network is disclosed. The method utilizes a statistical model to determine whether the previously retrieved document likely changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision of whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.
대표청구항▼
1. A computer-implemented method for selectively accessing a document in response to a current retrieval request, the document being identified by a document address specification, the document having been retrieved during a previous retrieval request, the method comprising:determining whether to ac
1. A computer-implemented method for selectively accessing a document in response to a current retrieval request, the document being identified by a document address specification, the document having been retrieved during a previous retrieval request, the method comprising:determining whether to access the document during the current retrieval request by identifying with the aid of a statistical model whether the document is likely to have changed since a previous retrieval request; and accessing the document if the determination produces an instruction indicative that the document at the document address specification should be accessed during the current retrieval request, wherein determining whether to access the document during the current retrieval request comprises computing a probability that the document is likely to have changed since a previous retrieval request, and further wherein computing the probability that the document is likely to have changed since a previous retrieval request comprises: selecting an active probability indicative of a proportion of documents in a plurality of documents that are changing at various change rates, the plurality of documents including the document, training the active probability to reflect an experience with the document during a plurality of previous document retrieval requests, and using the trained active probability to compute the probability that the document has changed since a previous retrieval request. 2. The method of claim 1, further comprising:selecting the probability that the document has changed since the previous document retrieval request as the active probability in the current retrieval request; and computing the probability that the document is likely to have changed since a previous retrieval request for the current retrieval request. 3. The method of claim 1, wherein training the active probability includes multiplying the active probability indicative of a change in the document by a training probability calculated using a statistical model.4. The method of claim 1, wherein determining whether to access the document during the current retrieval request with the aid of a statistical model further comprises:training a document probability distribution corresponding to the document address specification to reflect an experience with the document during a plurality of previous document retrieval requests, the document probability distribution including a plurality of probabilities; determining from the document probability distribution a probability that the document has changed; and making a determination of whether to access the document in a current document retrieval request based on the probability that the document has changed. 5. The method of claim 4, further comprising:calculating, based on the experience with the document during a plurality of previous document retrieval requests, a discrete random variable distribution that includes a plurality of training probabilities; multiplying each probability in the document probability distribution by a corresponding training probability from the discrete random variable distribution. 6. The method of claim 5, wherein the training probabilities are calculated using a Poisson process, the Poisson process including a Poisson equation (e^(?r*dt)) and a complementary Poisson equation (1?e^(?r*dt)).7. The method of claim 6, wherein the experience with the document during the plurality of previous document retrieval requests is derived from historical information associated with the document address specification.8. A computer-readable medium having computer-executable instructions for retrieving one document in a plurality of documents from a remote server, which when executed comprise:maintaining historical information representing prior changes to the one document at the remote server; initiating a document retrieval request procedure for retrieving particular documents in the plurality of documents; determining whether to access the one document from the remote server based on an analysis of the historical information representing prior changes to the one document at the remote server; and if the determination to access the one document is positive, identifying the one document for retrieval during the document retrieval procedure, wherein determining whether to retrieve the document further comprises: computing a probability that the one document has changed since the one document was last retrieved from the remote server, and further wherein computing the probability that the one document has changed comprises: beginning with a probability that a pre-defined proportion of documents in the plurality of documents has changed, and training the probability that the pre-defined proportion of documents has changed using the historical information associated with the one document to achieve the probability that the one document has changed since the one document was last retrieved. 9. The computer-readable medium of claim 8, further comprising making a random decision to retrieve the one document wherein the random decision is biased by the probability that the one document has changed.10. The computer-readable medium of claim 9, wherein the random decision is further biased by a synchronization level configured to influence the random decision based on a predetermined degree of tolerance for not retrieving the one document if the document is likely to have changed.11. The computer-readable medium of claim 9, wherein the random decision is made by a software routine adapted to simulate a flip of a coin.12. The computer-readable medium of claim 8, wherein:the historical information representing prior changes to the one document comprises for the one document, a change count representing the number of times the one document has been modified, an access count representing the number of times the one document has been accessed, a first access time representing the time the one document was first accessed, and a last access time representing the time the one document was last accessed; and wherein the step of training the probability comprises creating a timeline using the historical information, the timeline having representations thereon of no change intervals, change intervals, and no change chunk intervals. 13. The computer-readable medium of claim 12, wherein the step of training the probability further comprises:training the document probability distribution for each no change interval; training the document probability distribution for each change interval; and training the document probability distribution for each no change chunk interval. 14. The computer-readable medium of claim 8, wherein:the historical information representing prior changes to the one document includes a hash value associated with the one document, the hash value being a representation of the one document; and wherein the analysis includes a comparison of the hash value included in the historical information with another hash value calculated from information retrieved from the one document stored on the remote server. 15. The computer-readable medium of claim 14, wherein if the hash value included in the historical information does not match the other hash value associated with the one document stored on the remote server, updating the historical information to identify that the one document changed.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (10)
Narendran Balakrishnan ; Rangarajan Sampath ; Yajnik Shalini, Data distribution techniques for load-balanced fault-tolerant web access.
Najork Marc Alexander ; Heydon Clark Allan ; Wiener Janet Lynn, Web crawler system using plurality of parallel priority level queues having distinct associated download priority levels for prioritizing document downloading and maintaining document freshness.
Palliyil, Sudarshan; Venkateshamurthy, Shivakumara; Aswathanarayana, Tejasvi, Computer program product and computer system for controlling performance of operations within a data processing system or networks.
Palliyil, Sudarshan; Venkateshamurthy, Shivakumara; Vijayaraghavan, Srinivas Belur; Aswathanarayana, Tejasvi, Hash-based access to resources in a data processing network.
Dengler, Patrick M.; Krishnan, Arvind K.; Singh, Jagdish; Sanchez, Lawrence M.; Shankar, Sai; Chittamuru, Satish Kumar; Pekic, Zoltan; Mondal, Nabarun; Kumar, Namendra; i Dalfó, Ricard Roma, Metadata driven user interface.
Palliyil, Sudarshan; Venkateshamurthy, Shivakumara; Aswathanarayana, Tejasvi, Method and computer program product for identifying or managing vulnerabilities within a data processing network.
Palliyil, Sudarshan; Venkateshamurthy, Shivakumara; Vijayaraghavan, Srinivas Belur; Aswathanarayana, Tejasvi, Methods, apparatus and computer programs for enhanced access to resources within a network.
Gupta,Arun K.; Uppal,Rajiv K.; Parikh,Devang I., Object oriented based, business class methodology for generating quasi-static web pages at periodic intervals.
Fredricksen, Eric Russell; Schneider, Fritz John; Dean, Jeffrey Adgate; Ghemawat, Sanjay; Provos, Niels; Harik, Georges, System and method of accessing a document efficiently through multi-tier web caching.
Fredricksen, Eric Russell; Schneider, Fritz John; Dean, Jeffrey Adgate; Ghemawat, Sanjay; Provos, Niels; Harik, Georges, System and method of accessing a document efficiently through multi-tier web caching.
Fredrickson, Eric Russell; Feng, Hanping; Kataru, Naga Sridhar; Harik, Georges, System and method of accessing a document efficiently through multi-tier web caching.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.