[특허]Extracting information from symbolically compressed document images

Extracting information from symbolically compressed document images 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06K-009/72
출원번호	US-0676881 (2003-09-30)
발명자 / 주소	Lee,Dar Shyang Hull,Jonathan J.
출원인 / 주소	Ricoh Co., Ltd.
대리인 / 주소	Blakely, Sokoloff, Taylor &
인용정보	피인용 횟수 : 35 인용 특허 : 19

초록 ▼

A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.

대표청구항 ▼

What is claimed is: 1. A method comprising: representing an input document image with a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed. 2. The method of claim 1, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a confidential document that requires an authorization before operating on the input document. 3. The method of claim 2, wherein if the at least one matched document is determined to be a confidential document, the method further comprises: prompting a user for an authorization; and determining whether the authorization received from the user satisfies a level of the authorization required by the at least one matched document. 4. The method of claim 1, wherein the plurality of documents comprises a hierarchy of confidential documents, at least a portion of the confidential documents being associated with a respective confidentiality rating that requires a specific authorization associated with the respective rating. 5. The method of claim 1, wherein the operation on the input document comprises at least one of scanning and copying the input document. 6. The method of claim 1, wherein the operation on the input document comprises printing the at least one matched document. 7. The method of claim 1, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a copyright protected document before copying the input document. 8. The method of claim 7, wherein if the at least one matched document is a copyright protected document, the method further comprises identifying a copyright holder and a copyright license fee associated with the copyright protected document. 9. The method of claim 8, further comprising: recording an identity of a user submitting the input document, dates, and number of copy operations performed on the input document; and storing the identify of the user, dates, and the number of copy operations in the database for accounting purposes. 10. The method of claim 9, further comprising transmitting the identity of the user and the number of copies of the input document to a remote facility over a network for billing purposes. 11. The method of claim 1, wherein the input document is a symbolically compressed document and the plurality of documents including the at least one matched document are stored as symbolically compressed documents. 12. The method of claim 1, further comprising mapping the alphabet characters to the template identifiers based at least partly on frequency of occurrence of the template identifiers. 13. The method of claim 1, further comprising extracting n-gram indexing terms from the text string, wherein the comparison of the input document and the plurality of documents is performed based on the n-gram indexing terms. 14. The method of claim 13, wherein extracting n-gram indexing terms compnses: selecting alphabet characters from the text string that satisfy a predicate; and combining the selected alphabet characters to form n-grams, n being an integer. 15. A document processing system comprising: a deciphering module to generate a first text string on a sequence of template identifiers in a first document and to generate a second text string based on a sequence of template identifiers in a second document; a comparison module to generate a measure of similarity between the first and the second documents based on the first and second text strings to determine whether the first and second documents are matched; and a security module to examine whether the second document satisfies a predetermined security criteria based on an attribute associated with the second document to determine whether an operation on the first document is allowed. 16. The document processing system of claim 15, wherein the security module further determines whether the second document is a confidential document that requires an authorization before operating on the first document. 17. The document processing system of claim 16, wherein if the second document is determined to be a confidential document, the security module further prompts a user for an authorization, and determines whether the authorization received from the user satisfies a level of the authorization required by the second document. 18. The document processing system of claim 15, wherein the second document is a member of a hierarchy of confidential documents stored in a database, at least a portion of the confidential documents being associated with a respective confidentiality rating that requires a specific authorization associated with the respective rating. 19. The document processing system of claim 15, wherein the operation on the first document comprises at least one of scanning and copying the first document. 20. The document processing system of claim 15, wherein the operation on the first document comprises printing the second document. 21. The document processing system of claim 15, wherein the security module further determines whether the second document is a copyright protected document before copying the first document. 22. The document processing system of claim 21, wherein if the second document is a copyright protected document, the security module further identifies a copyright holder and a copyright license fee associated with the copyright protected document. 23. The document processing system of claim 22, further comprising an accounting module to: record an identity of a user submitting the first document, dates, and number of copy operations performed on the first document, and store the identify of the user, dates, and the number of copy operations in a database for accounting purposes. 24. The document processing system of claim 23, further comprising a communication module to transmit the identity of the user and the number of copies of the first document to a remote facility over a network for billing purposes. 25. The document processing system of claim 15, wherein the first and second documents are symbolically compressed documents. 26. The document processing system of claim 15, further comprising a conditional n-gram module coupled to receive the first and second text strings from the deciphering module, and to extract n-gram indexing terms from the, text string, wherein the comparison of the first document and the second document is performed based on the n-gram indexing terms. 27. The document processing system of claim 26, wherein the n-gram indexing terms are extracted by: selecting alphabet characters from the text string that satisfy a predicate; and combining the selected alphabet characters to form n-grams, n being an integer. 28. The document processing system of claim 27, wherein the deciphering module maps the alphabet characters to the template identifiers based at least partly on frequency of occurrence of the template identifiers. 29. An article of manufacture including one or more computer-readable storage media that embody a program of instructions, when executed by one or more processors in the processing system, causes the one or more processors to performing a method, the method comprising: generating a text string from an input document image represented by a sequence of template identifiers; replacing the template identifiers with alphabet characters according to language statistics to generate a text string representing the input document image; searching, among a plurality of documents in a database, for at least one of the plurality of documents that matches the input document based on the text string; and examining whether the at least one matched document satisfies a predetermined security criteria based on an attribute associated with the at least one matched document, to determine whether an operation on the input document is allowed. 30. The article of claim 29, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a confidential document that requires an authorization before operating on the input document. 31. The article of claim 30, wherein if the at least one matched document is determined to be a confidential document, the method further comprises: prompting a user for an authorization; and determining whether the authorization received from the user satisfies a level of the authorization required by the at least one matched document. 32. The article of claim 29, wherein the plurality of documents comprises a hierarchy of confidential documents, at least a portion of the confidential documents being associated with a respective confidentiality rating that requires a specific authorization associated with the respective rating. 33. The article of claim 29, wherein the operation on the input document comprises at least one of scanning and copying the input document. 34. The article of claim 29, wherein the operation on the input document comprises printing the at least one matched document. 35. The article of claim 29, wherein determining whether the at least one matched document satisfies a predetermined security criteria comprises determining whether the at least one matched document is a copyright protected document before copying the input document. 36. The article of claim 35, wherein if the at least one matched document is a copyright protected document, the method further comprises identifying a copyright holder and a copyright license fee associated with the copyright protected document. 37. The article of claim 36, wherein the method further comprises: recording an identity of a user submitting the input document, dates, and number of copy operations performed on the input document; and storing the identify of the user, dates, and the number of copy operations in the database for accounting purposes. 38. The article of claim 37, wherein the method further comprises transmitting the identity of the user and the number of copies of the input document to a remote facility over a network for billing purposes. 39. The article of claim 15, wherein the input document is a symbolically compressed document and the plurality of documents including the at least one matched document are stored as symbolically compressed documents. 40. The article of claim 15, wherein the method further comprises mapping the alphabet characters to the template identifiers based at least partly on frequency of occurrence of the template identifiers. 41. The article of claim 15, wherein the method further comprises extracting n-gram indexing terms from the text string, wherein the comparison of the input document and the plurality of documents is performed based on the n-gram indexing terms. 42. The article of claim 41, wherein extracting n-gram indexing terms comprises: selecting alphabet characters from the text string that satisfy a predicate; and combining the selected alphabet characters to form n-grams, n being an integer.

이 특허에 인용된 특허 (19)

Buchanan Ken (Eagan MN) Dowdle John A. (St. Paul MN), Apparatus and method for computer-assisted document generation.
상세보기
Schulze Bruno M., Automatic language identification using both N-gram and word information.
상세보기
Lau Raymond (Cambridge MA) Rosenfeld Ronald (Pittsburgh PA) Roukos Salim (Scarsdale NY), Building scalable N-gram language models using maximum likelihood maximum entropy N-gram models.
상세보기
Blum Eric (Wynnewood PA) Pierce Wilbur (Bryn Mawr PA), Cryptographic analysis system.
상세보기
Cohen Jonathan Drew, Device and method for full-text large-dictionary string matching using n-gram hashing.
상세보기
Shima Yoshihiro,JPX ; Marukawa Katsumi,JPX ; Koga Masashi,JPX ; Nakashima Kazuki,JPX ; Uehara Tetsuzo,JPX, Document processing apparatus and method for inputting the requirements of a reader or writer and for processing documen.
상세보기
Mead Donald C., Downloading of personalization layers for symbolically compressed objects.
상세보기
Powell Robert David, Identifying language and character set of data representing text.
상세보기
Cohen Jonathan Drew, Language-independent method of generating index terms.
상세보기
Bloomfield Marc Alan ; Krantz Jeffrey Isaac, Method for lossless bandwidth compression of a series of glyphs.
상세보기
Damashek Marc (Hampstead MD), Method of retrieving documents that concern the same topic.
상세보기
Kephart Jeffrey O. (Yorktown Heights NY), Methods and apparatus for evaluating and extracting signatures of computer viruses and other undesirable software entiti.
상세보기
Melen Roger D., Non-linear aggregation mapping compression of image data and method.
상세보기
Bernzott Phillip ; Dilworth John ; George David ; Higgins Bryan ; Knight Jeremy, Optical character recognition method and apparatus.
상세보기
Ilan Gabriel,ILX ; Aharonson Eran,ILX, Pattern recognition method and system.
상세보기
Kanevsky Dimitri ; Rao Srinivasa Patibandla, System and method for providing lossless compression of n-gram language models in a real-time decoder.
상세보기
Bai Shuanhu,SGX ; Wu Horng Jyh Paul,SGX ; Li Haizhou,SGX ; Loudon Gareth,SGX, System for chinese tokenization and named entity recognition.
상세보기
Ryan John Kevin,NZX, System for converting medical information into representative abbreviated codes with correction capability.
상세보기
Huttenlocher Daniel P. ; Rucklidge William J. ; Brown John Seely, Using fontless structured document image representations to render displayed and printed documents at preferred resolutions.
상세보기

이 특허를 인용한 특허 (35)

Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Adaptive pattern recognition based controller apparatus and method and human-interface therefore.
상세보기
King, Martin; Grover, Dale; Kushler, Clifford; Stafford-Fraser, James; Mannby, Claes-Fredrik, Archive of text captures from rendered documents.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Automatic modification of web pages.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Automatic modification of web pages.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplemental information.
상세보기
Napper, Jonathon Leigh, Classifying a string formed from a known number of hand-written characters.
상세보기
Napper, Jonathon Leigh, Classifying a string formed from hand-written characters.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Data capture from rendered documents using handheld device.
상세보기
Napper, Jonathon Leigh, Handwritten character recognition.
상세보기
Napper, Jonathon Leigh, Handwritten character recognition system.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Smith, Michael J., Image search using text-based elements within the contents of images.
상세보기
Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Internet appliance system and method.
상세보기
Stork, David G.; Shoaib, Mohammed, Method and apparatus for secure and oblivious document matching.
상세보기
Golovchinsky, Gene, Method and system for assessing copyright fees based on the content being copied.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Method and system for character recognition.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Method and system for character recognition.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Arends, Thomas C.; Bajorins, David P.; Fox, Daniel C., Optical scanners, such as hand-held optical scanners.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Portable scanning device.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Processing techniques for text capture from a rendered document.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Processing techniques for text capture from a rendered document.
상세보기
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Secure data gathering from rendered documents.
상세보기
Dub, Eitan; Dub, Adam O.; Miro, Alfredo J., System and method for automatic document management.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Valenti, William, Using gestalt information to identify locations in printed information.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Extracting information from symbolically compressed document images 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (19)

이 특허를 인용한 특허 (35)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Extracting information from symbolically compressed document images 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (19)

이 특허를 인용한 특허 (35)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트