Ranking documents based on user behavior and/or feature data
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-007/00
G06F-017/30
출원번호
UP-0869057
(2004-06-17)
등록번호
US-7716225
(2010-06-03)
발명자
/ 주소
Dean, Jeffrey A.
Anderson, Corin
Battle, Alexis
출원인 / 주소
Google Inc.
대리인 / 주소
Harrity & Harrity, LLP
인용정보
피인용 횟수 :
96인용 특허 :
33
초록
A system generates a model based on feature data relating to different features of a link from a linking document to a linked document and user behavior data relating to navigational actions associated with the link. The system also assigns a rank to a document based on the model.
대표청구항▼
What is claimed is: 1. A method performed by one or more server devices, comprising: storing, in a memory associated with the one or more server devices, feature data associated with a plurality of first links, within a plurality of first source documents, that point to a plurality of first target
What is claimed is: 1. A method performed by one or more server devices, comprising: storing, in a memory associated with the one or more server devices, feature data associated with a plurality of first links, within a plurality of first source documents, that point to a plurality of first target documents, the feature data, for one of the plurality of first links, including one or more features of one of the plurality of first source documents that contains the one of the plurality of links, one or more features of one of the plurality of first target documents that is pointed to by the one of the plurality of links, and one or more features of the one of the plurality of first links; storing, in a memory associated with the one or more server devices, user behavior data relating to user navigational activity with regard to the plurality of first source documents accessed by one or more users and the plurality of first links within the plurality of first source documents selected by the one or more users; training, using one or more processors of the one or more server devices and based on the feature data and the user behavior data, a model that identifies a probability that a particular link, with particular feature data, will be selected by a user, where training the model includes: analyzing the feature data associated with each of the plurality of first links that was selected by the one or more users and the feature data associated with each of the plurality of first links that was not selected by the one or more users to generate rules for the model; identifying, by one or more processors associated with the one or more server devices, a plurality of second links, within a plurality of second source documents, that point to a plurality of second target documents; determining, using one or more processors associated with the one or more server devices, feature data associated with each of the plurality of second links, the feature data, associated with one of the plurality of second links, including one or more features of the one of the plurality of second links, one or more features of one of the plurality of second source documents that contains the one of the plurality of second links, and one or more features of the one of the plurality of second target documents that is pointed to by the one of the plurality of second links; determining, using the model and based on the feature data, a probability that each of the plurality of second links will be selected by a user, where the determining includes: inputting, into the model, the feature data associated with the one of the plurality of second links, and outputting, by the model, the probability that the one of the plurality of second links will be selected by a user; calculating, using one or more processors associated with the one or more server devices, a rank for a particular target document of the plurality of second target documents based on the probability associated with one or more of the plurality of second links that point to the particular target document; and ordering the particular target document, with regard to at least one other document, based on the rank for the particular target document. 2. The method of claim 1, further comprising: obtaining data relating to the user navigational activity of the one or more users from client devices used by the one or more users. 3. The method of claim 1, where the user behavior data corresponds to a single user. 4. The method of claim 1, where the user behavior data corresponds to a class of users. 5. The method of claim 1, where the features associated with one of the plurality of first source documents include at least one of an entire address of the one of the plurality of first source documents, a portion of the address of the one of the plurality of first source documents, information regarding a web site associated with the one of the plurality of first source documents, a number of links in the one of the plurality of first source documents, presence of words in the one of the plurality of first source documents, presence of words in a heading of the one of the plurality of first source documents, a topical cluster with which the one of the plurality of first source documents is associated, or a degree to which a topical cluster associated with the one of the plurality of first source documents matches a topical cluster associated with a link. 6. The method of claim 1, where the features associated with one of the plurality of first links include at least one of a font size of anchor text associated with the one of the plurality of first links, a position of the one of the plurality of first links within one of the plurality of first source documents, a position of the one of the plurality of first links in a list, a font color associated with the one of the plurality of first links, attributes of the one of the plurality of first links, a number of words in the anchor text associated with the one of the plurality of first links, actual words in the anchor text associated with the one of the plurality of first links, a determination of commerciality of the anchor text associated with the one of the plurality of first links, a type of the one of the plurality of first links, a context of words before or after the one of the plurality of first links, a topical cluster with which the anchor text of the one of the plurality of first links is associated, whether the one of the plurality of first links leads to a first target document on a same host or domain as one of the plurality of first source documents containing the one of the plurality of first links, or whether an address associated with the one of the plurality of first links embeds another address. 7. The method of claim 1, where the features associated with one of the plurality of first target documents include at least one of an entire address of the one of the plurality of first target documents, a portion of the address of the one of the plurality of first target documents, information regarding a web site associated with the one of the plurality of first target documents, whether the address of the one of the plurality of first target documents is on a same host as an address of a first source document that links to the one of the plurality of first target documents, whether the address of the one of the plurality of first target documents is associated with a same domain as the address of the first source document, words in the address of the one of the plurality of first target documents, or a length of the address of the one of the plurality of first target documents. 8. The method of claim 1, further comprising: generating a feature vector for each one of the plurality of first links based on the feature data associated with the one of the plurality of first links. 9. The method of claim 8, where analyzing the feature data associated with the plurality of first links and the instances where each of the plurality of the first links were selected by the one or more users and the instances where each of the plurality of first links were not selected by the one or more users includes: generating the rules for the model based on the instances where each of the plurality of the first links were selected by the one or more users and the instances where each of the plurality of first links were not selected by the one or more users and the feature vectors. 10. The method of claim 1, where the rules for the model comprise: a general rule applicable to a group of documents, and a specific rule applicable to a particular document. 11. The method of claim 1, further comprising: periodically updating the rules for the model based on changes in the user behavior data. 12. A method performed by one or more server devices, comprising: storing, in one or more memories associated with the one or more server devices, feature data associated with a plurality of first links within a plurality of first source documents that point to a plurality of first target documents, the feature data including features of the first source documents, features of the first target documents, and features of the first links; storing, in one or more memories associated with the one or more server devices, user behavior data relating to user navigational activity with regard to the first links within the first source documents selected by one or more users; training, using one or more processors associated with the one or more server devices and based on the feature data associated with the feature data associated with the first links and the user behavior data relating to the first links, a model that identifies a probability that a particular link will be selected by a user, where training the model includes: analyzing the feature data associated with the first links that were selected by the one or more users and the feature data associated with the first links that were not selected by the one or more users to generate rules for the model; identifying a plurality of second links within a plurality of second source documents that point to a plurality of second target documents; determining feature data associated with the second links, the feature data associated with the second links including features of the second source documents, features of the second target documents, and features of the second links; determining, using the model, a probability that each of the second links will be selected using only the feature data associated with the second link as input to the model; assigning a weight to each of the second links based on the probability that the second link will be selected; assigning a rank to one of the second target documents based on the weights assigned to the second links that point to the one of the second target documents; and ordering the one of the second target documents, with regard to at least one other document, based on the rank assigned to the one of the second target documents. 13. The method of claim 12, further comprising: periodically updating the rules for the model based on changes to the user behavior data. 14. The method of claim 12, where the user behavior data corresponds to a single user. 15. The method of claim 12, where the user behavior data corresponds to a plurality of users. 16. One or more server devices, comprising: means for storing, in a memory, feature data associated with a plurality of links within source documents that point to target documents, the feature data including data associated with features of the source documents, data associated with features of the links, and data associated with features of the target documents, the data associated with the features of one of the source documents including at least one of an entire address of the source document, a portion of the address of the source document, information regarding a web site associated with the source document, a number of links in the source document, presence of words in the source document, presence of words in a heading of the source document, a topical cluster with which the source document is associated, or a degree to which a topical cluster associated with the source document matches a topical cluster associated with a link, the data associated with the features of one of the links including at least one of a font size of anchor text associated with the link, a position of the link within a source document, a position of the link in a list, a font color associated with the link, attributes of the link, a number of words in the anchor text associated with the link, actual words in the anchor text associated with the link, a determination of commerciality of the anchor text associated with the link, a type of the link, a context of words before or after the link, a topical cluster with which the anchor text of the link is associated, whether the link leads to a target document on a same host or domain, or whether an address associated with the link embeds another address, and the data associated with the features of one of the target documents including at least one of an entire address of the target document, a portion of the address of the target document, information regarding a web site associated with the target document, whether the address of the target document is on a same host as an address of a source document that links to the target document, whether the address of the target document is associated with a same domain as the address of the source document, words in the address of the target document, or a length of the address of the target document; means for storing, in a memory, user behavior data relating to user navigational activity with regard to the source documents accessed by one or more users and the links within the source documents selected by the one or more users and the links within the source documents that were not selected by the one or more users; means for training, based on the feature data and instances where the links were selected by the one or more users and instances where the links were not selected by the one or more users, a model that identifies a probability that a link, with particular feature data, will be selected by a user, where the means for training includes: means for analyzing the feature data associated with the links that were selected by the one or more users and the feature data associated with the links that were not selected by the one or more users to generate rules for the model; means for identifying a particular link within a first document that points to a second document; means for determining the feature data associated with the particular link; means for determining, based on inputting the feature data into the model, a probability that the particular link will be selected by a user; means for assigning a weight to the particular link based on the probability that the particular link will be selected; means for assigning a rank to the second document based on the weight assigned to the particular link; and means for ordering the second document, with respect to at least one other document, based on the assigned rank. 17. The one or more server devices of claim 16, where the data associated with the features of the one of the source documents includes at least two of: the entire address of the source document, the portion of the address of the source document, the information regarding a web site associated with the source document, the number of links in the source document, the presence of words in the source document, the presence of words in a heading of the source document, the topical cluster with which the source document is associated, or the degree to which a topical cluster associated with the source document matches a topical cluster associated with a link. 18. The one or more server devices of claim 16, where the data associated with the features of the one of the links includes at least two of: the font size of anchor text associated with the link, the position of the link within a source document, the position of the link in a list, the font color associated with the link, the attributes of the link, the number of words in the anchor text associated with the link, the actual words in the anchor text associated with the link, the determination of commerciality of the anchor text associated with the link, the type of the link, the context of words before or after the link, the topical cluster with which the anchor text of the link is associated, whether the link leads to a target document on a same host or domain, or whether an address associated with the link embeds another address. 19. The one or more server devices of claim 6, where the data associated with the features of the one of the target documents includes at least two of: the entire address of the target document, the portion of the address of the target document, the information regarding a web site associated with the target document, whether the address of the target document is on a same host as an address of a source document that links to the target document, whether the address of the target document is associated with a same domain as the address of the source document, the words in the address of the target document, or the length of the address of the target document.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (33)
Mao, Jianchang; Abrol, Mani; Mukherjee, Rajat; Tourn, Michel; Raghavan, Prabhakar, Apparatus and method for adaptively ranking search results.
Zeng, Hua Jun; Xue, Gui Rong; Chen, Zheng; Ma, Wei Ying, Implicit links search enhancement system and method for search engines using implicit links generated by mining user access patterns.
Adriaans Pieter Willem,NLX ; Knobbe Arno Jan,NLX ; Gathier Marc,NLX, System and method for generating performance models of complex information technology systems.
Imig, Scott K.; Apacible, Johnson T.; Bala, Aravind; Bailey, Peter R.; Geetha, Gayathri Ravichandran; Rounthwaite, Robert L.; Yang, Hung-chih, Auto-detection of historical search context.
Mackinlay, Jock Douglas; Stolte, Christopher Richard; Hanrahan, Patrick, Computer systems and methods for automatically viewing multidimensional databases.
Walkingshaw, Andrew David; Aleksandrovsky, Boris Lev; van Hoff, Arthur Anthonie; Breunig, Markus, Generating an implied object graph based on user behavior.
Lopatenko, Andrei; Kim, Hyung-Jin; Dornbush, Sandor; Wei, Leonard; Kilbourn, Timothy P.; Lopyrev, Mikhail, Ranking search results based on similar queries.
Mackinlay, Jock Douglas; Stolte, Christopher Richard, Selecting the type of visual marks in data visualizations based on user-selected visual properties of the marks.
Levy, Joshua Howard; Wilbur, Thomas Wyatt; Moore, Lauri Janet; Kurti, Ron Maire; Cohen, Benjamin David; Marcotte, Gary Joseph, Systems, methods, and devices for determining and displaying market relative position of unique items.
Levy, Joshua Howard; Wilbur, Thomas Wyatt; Moore, Lauri; Kurti, Ron Maire; Cohen, Benjamin David; Marcotte, Gary Joseph, Systems, methods, and devices for determining and displaying market relative position of unique items.
Franke, David Wayne; Levy, Joshua Howard; Grasemann, Hans Ulrich; Moore, Lauri Janet; Pratt, David; Moose, William T.; Moore, Andrew K.; Prior, John William, Systems, methods, and devices for generating recommendations of unique items.
Franke, David Wayne; Wilbur, Thomas Wyatt, Systems, methods, and devices for identifying and presenting identifications of significant attributes of unique items.
Franke, David Wayne; Wilbur, Thomas Wyatt, Systems, methods, and devices for identifying and presenting identifications of significant attributes of unique items.
Franke, David Wayne; Wilbur, Thomas Wyatt, Systems, methods, and devices for identifying and presenting identifications of significant attributes of unique items.
Levy, Joshua Howard; Feldman, Lauri Janet; Dove, Andrew Philip, Systems, methods, and devices for measuring similarity of and generating recommendations for unique items.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.