IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0048426
(2011-03-15)
|
등록번호 |
US-9881009
(2018-01-30)
|
발명자
/ 주소 |
- Weight, Christopher F.
- Birkett, Andrew D.
- Hamaker, Janna
- Killalea, Tom
- Nelson, Alexander William Robb
|
출원인 / 주소 |
- Amazon Technologies, Inc.
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
1 인용 특허 :
31 |
초록
▼
Techniques are described for identifying book title sets. The techniques may include a first-pass comparison with other books to identify other candidate title sets. A second-pass comparison may then be performed with respect to the candidate title sets. The first-pass comparison may be based on boo
Techniques are described for identifying book title sets. The techniques may include a first-pass comparison with other books to identify other candidate title sets. A second-pass comparison may then be performed with respect to the candidate title sets. The first-pass comparison may be based on book metadata such as titles and authorship. The second-pass comparison may include a more comprehensive content comparison, such as comparing the body text of the books.
대표청구항
▼
1. A computer-implemented method, comprising: under control of one or more processors configured with executable instructions,receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book having first body text and first metadata;normalizing
1. A computer-implemented method, comprising: under control of one or more processors configured with executable instructions,receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book having first body text and first metadata;normalizing the electronic book by removing illustrations from the electronic book, removing extraneous characters from the electronic book, and converting characters of the electronic book to a single case;determining, in response to the normalizing of the electronic book, whether the first metadata of the electronic book matches metadata of any existing book title sets;based at least partly on a first determination that the first metadata of the electronic book matches second metadata of no more than a single existing book title set that includes at least one book, adding the electronic book to the single existing book title set such that the single existing book title set includes the at least one book and the electronic book;based at least partly on a second determination that the first metadata of the electronic book matches third metadata of multiple existing book title sets, calculating a text matching score corresponding to individual ones of the existing book title sets, the text matching score indicating a comparison of a first frequency of one or more words included in the first body text of the electronic book and a second frequency of the one or more words included in second body text of the corresponding existing book title set; andadding the electronic book to an existing book title set of the multiple existing book title sets based at least partly on the text matching score corresponding to the existing book title set being greater than a specified threshold, the existing book title set including the electronic book and one or more other books. 2. The computer-implemented method of claim 1, wherein the first metadata of the electronic book and the metadata of the existing book title sets indicates one or more of: title;authorship;publisher;publication date;copyright date; andInternational Standard Book Number (ISBN). 3. The computer-implemented method of claim 1, wherein calculating the text matching score comprises evaluating word alignment between the electronic book and the existing book title set. 4. The computer-implemented method of claim 1, wherein calculating the text matching score comprises evaluating page alignment between the electronic book and the existing book title set. 5. The computer-implemented method of claim 1, wherein calculating the text matching score comprises evaluating word frequencies of the electronic book and the existing book title set. 6. The computer-implemented method of claim 1, wherein calculating the text matching score comprises evaluating edit distances between the electronic book and the existing book title set. 7. A computer-implemented method, comprising: under control of one or more processors configured with executable instructions,receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book having first body text and first metadata;normalizing the electronic book by at least one of removing illustrations from the electronic book, removing extraneous characters from the electronic book, or converting characters of the electronic book to a single case;comparing the first metadata of the electronic book with second metadata corresponding to other books to identify one or more candidate title sets of which the electronic book may be a member;determining that a number of the one or more candidate title sets meets or exceeds a pre-determined number of candidate title sets; andbased at least partly on the determining that the number of the one or more candidate title sets meets or exceeds the pre-determined number of candidate title sets, comparing the first body text of the electronic book with second body text of the one or more candidate title sets to determine that the electronic book is a member of the one or more candidate title sets. 8. The computer-implemented method of claim 7, wherein the second body text of the one or more candidate title sets comprises a canonical text corresponding to the one or more candidate title sets. 9. The computer-implemented method of claim 7, wherein the second body text of the one or more candidate title sets comprises body text of an existing member of the one or more candidate title sets. 10. The computer-implemented method of claim 7, wherein the first metadata of the electronic book and the second metadata corresponding to the other books comprises multiple data fields. 11. The computer-implemented method of claim 7, wherein the first metadata of the electronic book and the second metadata corresponding to the other books comprises at least an author field and a title field. 12. The computer-implemented method of claim 7, wherein: the first metadata of the electronic book comprises a first author field and the second metadata corresponding to the other books comprises a second author field; andcomparing the first metadata comprises determining whether there is common authorship between the electronic book and the other books based on the first author field and the second author field. 13. The computer-implemented method of claim 7, wherein: the first metadata of the electronic book and the second metadata corresponding to the other electronic books indicate respective titles of the electronic book and the other electronic books; andthe method further comprising normalizing the first metadata prior to comparing the first metadata. 14. The computer-implemented method of claim 7, wherein comparing the first metadata comprises calculating metadata similarity scores based at least in part on similarity between the first metadata of the electronic book and the second metadata corresponding to the other electronic books. 15. The computer-implemented method of claim 7, wherein comparing the first body text comprises evaluating word alignment between the electronic book and the one or more candidate title sets. 16. The computer-implemented method of claim 7, wherein comparing the first body text comprises evaluating page alignment between the electronic book and the one or more candidate title sets. 17. The computer-implemented method of claim 7, wherein comparing the first body text comprises evaluating word frequencies of the electronic book and the one or more candidate title sets. 18. The computer-implemented method of claim 7, wherein comparing the first body text comprises evaluating edit distances between the first body text of the electronic book and the second body text of the one or more candidate title sets. 19. An online electronic book service, comprising: one or more processors; andone or more non-transitory computer-readable storage media containing instructions that are executable by the one or more processors to perform actions comprising: receiving, from a device of an author and via a content ingestion service associated with a network, an electronic book;normalizing the electronic book by at least one of removing illustrations from the electronic book, removing extraneous characters from the electronic book, or converting characters of the electronic book to a single case;performing a first-pass comparison of metadata of the electronic book with metadata of different book title sets to identify one or more candidate title sets of which the electronic book may be a member; andbased at least partly on a determination that the first-pass comparison identifies a partial match for multiple candidate title sets, performing a second-pass comparison of first body text of the electronic book with second body text of the multiple candidate title sets to determine that the electronic book is a member of any of the multiple candidate title sets. 20. The online electronic book service of claim 19, wherein the second-pass comparison comprises comparing word frequencies of the electronic book and the multiple candidate title sets. 21. The online electronic book service of claim 19, wherein the second-pass comparison comprises comparing word alignment of the electronic book with the multiple candidate title sets. 22. The online electronic book service of claim 19, wherein the second-pass comparison comprises comparing page alignment of the electronic book with the multiple candidate title sets. 23. The online electronic book service of claim 19, wherein the second-pass comparison comprises comparing edit distances between the electronic book and the multiple candidate title sets. 24. The online electronic book service of claim 19, wherein the actions further comprise, based at least partly on a determination that the first-pass comparison does not identify any candidate title sets of the multiple candidate title sets, performing the first-pass comparison with respect to a different electronic book.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.