IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0676881
(2003-09-30)
|
발명자
/ 주소 |
- Lee,Dar Shyang
- Hull,Jonathan J.
|
출원인 / 주소 |
|
대리인 / 주소 |
Blakely, Sokoloff, Taylor &
|
인용정보 |
피인용 횟수 :
35 인용 특허 :
19 |
초록
▼
A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram m
A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.
대표청구항
▼
What is claimed is: 1. A method comprising: representing an input document image with a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a
What is claimed is: 1. A method comprising: representing an input document image with a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed. 2. The method of claim 1, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a confidential document that requires an authorization before operating on the input document. 3. The method of claim 2, wherein if the at least one matched document is determined to be a confidential document, the method further comprises: prompting a user for an authorization; and determining whether the authorization received from the user satisfies a level of the authorization required by the at least one matched document. 4. The method of claim 1, wherein the plurality of documents comprises a hierarchy of confidential documents, at least a portion of the confidential documents being associated with a respective confidentiality rating that requires a specific authorization associated with the respective rating. 5. The method of claim 1, wherein the operation on the input document comprises at least one of scanning and copying the input document. 6. The method of claim 1, wherein the operation on the input document comprises printing the at least one matched document. 7. The method of claim 1, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a copyright protected document before copying the input document. 8. The method of claim 7, wherein if the at least one matched document is a copyright protected document, the method further comprises identifying a copyright holder and a copyright license fee associated with the copyright protected document. 9. The method of claim 8, further comprising: recording an identity of a user submitting the input document, dates, and number of copy operations performed on the input document; and storing the identify of the user, dates, and the number of copy operations in the database for accounting purposes. 10. The method of claim 9, further comprising transmitting the identity of the user and the number of copies of the input document to a remote facility over a network for billing purposes. 11. The method of claim 1, wherein the input document is a symbolically compressed document and the plurality of documents including the at least one matched document are stored as symbolically compressed documents. 12. The method of claim 1, further comprising mapping the alphabet characters to the template identifiers based at least partly on frequency of occurrence of the template identifiers. 13. The method of claim 1, further comprising extracting n-gram indexing terms from the text string, wherein the comparison of the input document and the plurality of documents is performed based on the n-gram indexing terms. 14. The method of claim 13, wherein extracting n-gram indexing terms compnses: selecting alphabet characters from the text string that satisfy a predicate; and combining the selected alphabet characters to form n-grams, n being an integer. 15. A document processing system comprising: a deciphering module to generate a first text string on a sequence of template identifiers in a first document and to generate a second text string based on a sequence of template identifiers in a second document; a comparison module to generate a measure of similarity between the first and the second documents based on the first and second text strings to determine whether the first and second documents are matched; and a security module to examine whether the second document satisfies a predetermined security criteria based on an attribute associated with the second document to determine whether an operation on the first document is allowed. 16. The document processing system of claim 15, wherein the security module further determines whether the second document is a confidential document that requires an authorization before operating on the first document. 17. The document processing system of claim 16, wherein if the second document is determined to be a confidential document, the security module further prompts a user for an authorization, and determines whether the authorization received from the user satisfies a level of the authorization required by the second document. 18. The document processing system of claim 15, wherein the second document is a member of a hierarchy of confidential documents stored in a database, at least a portion of the confidential documents being associated with a respective confidentiality rating that requires a specific authorization associated with the respective rating. 19. The document processing system of claim 15, wherein the operation on the first document comprises at least one of scanning and copying the first document. 20. The document processing system of claim 15, wherein the operation on the first document comprises printing the second document. 21. The document processing system of claim 15, wherein the security module further determines whether the second document is a copyright protected document before copying the first document. 22. The document processing system of claim 21, wherein if the second document is a copyright protected document, the security module further identifies a copyright holder and a copyright license fee associated with the copyright protected document. 23. The document processing system of claim 22, further comprising an accounting module to: record an identity of a user submitting the first document, dates, and number of copy operations performed on the first document, and store the identify of the user, dates, and the number of copy operations in a database for accounting purposes. 24. The document processing system of claim 23, further comprising a communication module to transmit the identity of the user and the number of copies of the first document to a remote facility over a network for billing purposes. 25. The document processing system of claim 15, wherein the first and second documents are symbolically compressed documents. 26. The document processing system of claim 15, further comprising a conditional n-gram module coupled to receive the first and second text strings from the deciphering module, and to extract n-gram indexing terms from the, text string, wherein the comparison of the first document and the second document is performed based on the n-gram indexing terms. 27. The document processing system of claim 26, wherein the n-gram indexing terms are extracted by: selecting alphabet characters from the text string that satisfy a predicate; and combining the selected alphabet characters to form n-grams, n being an integer. 28. The document processing system of claim 27, wherein the deciphering module maps the alphabet characters to the template identifiers based at least partly on frequency of occurrence of the template identifiers. 29. An article of manufacture including one or more computer-readable storage media that embody a program of instructions, when executed by one or more processors in the processing system, causes the one or more processors to performing a method, the method comprising: generating a text string from an input document image represented by a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed. 30. The article of claim 29, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a confidential document that requires an authorization before operating on the input document. 31. The article of claim 30, wherein if the at least one matched document is determined to be a confidential document, the method further comprises: prompting a user for an authorization; and determining whether the authorization received from the user satisfies a level of the authorization required by the at least one matched document. 32. The article of claim 29, wherein the plurality of documents comprises a hierarchy of confidential documents, at least a portion of the confidential documents being associated with a respective confidentiality rating that requires a specific authorization associated with the respective rating. 33. The article of claim 29, wherein the operation on the input document comprises at least one of scanning and copying the input document. 34. The article of claim 29, wherein the operation on the input document comprises printing the at least one matched document. 35. The article of claim 29, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a copyright protected document before copying the input document. 36. The article of claim 35, wherein if the at least one matched document is a copyright protected document, the method further comprises identifying a copyright holder and a copyright license fee associated with the copyright protected document. 37. The article of claim 36, wherein the method further comprises: recording an identity of a user submitting the input document, dates, and number of copy operations performed on the input document; and storing the identify of the user, dates, and the number of copy operations in the database for accounting purposes. 38. The article of claim 37, wherein the method further comprises transmitting the identity of the user and the number of copies of the input document to a remote facility over a network for billing purposes. 39. The article of claim 15, wherein the input document is a symbolically compressed document and the plurality of documents including the at least one matched document are stored as symbolically compressed documents. 40. The article of claim 15, wherein the method further comprises mapping the alphabet characters to the template identifiers based at least partly on frequency of occurrence of the template identifiers. 41. The article of claim 15, wherein the method further comprises extracting n-gram indexing terms from the text string, wherein the comparison of the input document and the plurality of documents is performed based on the n-gram indexing terms. 42. The article of claim 41, wherein extracting n-gram indexing terms comprises: selecting alphabet characters from the text string that satisfy a predicate; and combining the selected alphabet characters to form n-grams, n being an integer.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.