IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0101951
(2008-04-11)
|
등록번호 |
US-8812493
(2014-08-19)
|
발명자
/ 주소 |
- Tankovich, Vladimir
- Li, Hang
- Meyerzon, Dmitriy
- Xu, Jun
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
1 인용 특허 :
200 |
초록
▼
Architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string. The edit distance is employed in determining relevance of the document as part of result ranking by detecti
Architecture for extracting document information from documents received as search results based on a query string, and computing an edit distance between the data string and the query string. The edit distance is employed in determining relevance of the document as part of result ranking by detecting near-matches of a whole query or part of the query. The edit distance evaluates how close the query string is to a given data stream that includes document information such as TAUC (title, anchor text, URL, clicks) information, etc. The architecture includes the index-time splitting of compound terms in the URL to allow the more effective discovery of query terms. Additionally, index-time filtering of anchor text is utilized to find the top N anchors of one or more of the document results. The TAUC information can be input to a neural network (e.g., 2-layer) to improve relevance metrics for ranking the search results.
대표청구항
▼
1. A computer-implemented relevance system, comprising: one or more processors; anda memory coupled to the one or more processors, the memory storing instructions which, when executed by the one or more processors, cause the one or more processors to:extract document information from a document rece
1. A computer-implemented relevance system, comprising: one or more processors; anda memory coupled to the one or more processors, the memory storing instructions which, when executed by the one or more processors, cause the one or more processors to:extract document information from a document received as search results based on a query string, the document information including a universal resource locator wherein the universal resource locator includes a compound term;split the compound term into multiple, separate terms;find at least one of the multiple, separate terms in a dictionary of terms;generate a target data string based on the extracted document information, the target data string including one of the multiple, separate terms found in the dictionary; andcompute edit distance between the target data string and the query string, the edit distance employed in determining relevance of a document as part of result ranking. 2. The system of claim 1, wherein the document information includes at least one of a title information, universal resource locator information, click information, or anchor text. 3. The system of claim 1, wherein the compound terms of the document information are split at index time to compute the edit distance relative to the universal resource locator. 4. The system of claim 1, further comprising instructions for filtering anchor text of the document information at index time to compute a top-ranked set of anchor text. 5. The system of claim 1, wherein the document information further includes at least one of title characters, anchor characters, or click characters, and wherein the system further includes a neural network operable to compute the relevance of the document based on the document information and raw input features of a BM25F function, a click distance, a file type, a language and a universal resource locator depth. 6. The system of claim 1, wherein the edit distance is computed based on insertion and deletion of terms to increase proximity between the target data string and the query string. 7. The system of claim 1, wherein the edit distance is computed based on costs associated with insertion and deletion of terms to increase proximity between the target data string and the query string. 8. A computer-implemented method of determining relevance of a document, comprising: receiving a query string as part of a search process;extracting a universal resource locator from document information included in a document returned during the search process, wherein the universal resource locator includes a compound term;generating a target data string from the universal resource locator by splitting the compound term of the universal resource locator into multiple, separate terms and finding at least one of the multiple, separate terms in a dictionary of terms;computing edit distance between the target data string and the query string; andcalculating a relevance score based on the edit distance. 9. The method of claim 8, further comprising employing term insertion as part of computing the edit distance and assessing an insertion cost for insertion of a term in the query string to generate the target data string, the cost represented as a weighting parameter. 10. The method of claim 8, further comprising employing term deletion as part of computing the edit distance and assessing a deletion cost for deletion of a term in the query string to generate the target data string, the cost represented as a weighting parameter. 11. The method of claim 8, further comprising computing a position cost as part of computing the edit distance, the position cost associated with one or more of term insertion and term deletion of a term position in the target data string. 12. The method of claim 8, further comprising performing a matching process between characters of the target data string and characters of the query string to compute an overall cost of computing the edit distance. 13. The method of claim 8, wherein splitting compound terms of the universal resource locator occurs at index time. 14. The method of claim 8, further comprising filtering anchor text of the target data string to find a top-ranked set of anchor text based on frequency of occurrence in the document. 15. The method of claim 14, further comprising computing an edit distance score for anchor text in the set. 16. The method of claim 8, further comprising inputting a score, derived from computing the edit distance, into a two-layer neural network after application of a transform function, the score generated based on calculating the edit distance associated with at least one of title information, anchor information, click information, or universal resource locator information, and other raw input features. 17. A computer-implemented method of computing relevance of a document, comprising: processing a query string as part of a search process to return a result set of documents;generating a target data string based on document information extracted from a document of the result set, the document information including a universal resource locator, wherein the universal resource locator includes a compound term, wherein generating the target data string includes splitting the compound term into multiple, separate terms, and finding at least one of the multiple, separate terms in a dictionary of terms;computing edit distance between the target data string and the query string based on term insertion, term deletion, and term position; andcalculating a relevance score based on the edit distance, the relevance score used to rank the document in the result set. 18. The method of claim 17, further comprising computing a cost associated with each term insertion, term deletion and term position, and factoring the cost into computation of the relevance score. 19. The method of claim 17, further comprising splitting compound terms of the universal resource locator information at index time and filtering the anchor text information at index time to find a top-ranked set of anchor text based on frequency of occurrence of the anchor text in the document. 20. The method of claim 17, further comprising reading occurrences of terms of the query string to construct a string of query terms in order of appearance in a source universal resource locator string and filling space between the terms with word marks.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.