In the proposed document retrieving apparatus, text feature data that bases upon text data included in a document and image feature data that bases upon a document image are stored in a memory. Image data of a search document is subjected to character recognition processing, text feature data is acq
In the proposed document retrieving apparatus, text feature data that bases upon text data included in a document and image feature data that bases upon a document image are stored in a memory. Image data of a search document is subjected to character recognition processing, text feature data is acquired based on the obtained text data, and image feature data (layout data) is acquired based on the image data of the search document. Using the text feature data and image feature data acquired with respect to the search document, a memory is searched, and a document corresponding to the search document is retrieved from plural documents.
대표청구항▼
What is claimed is: 1. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a document and acquiring text feature
What is claimed is: 1. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the document; a storing step of storing, in storage means, text feature data and layout feature data respectively acquired from a registered document in said first and second acquisition steps, in association with the registered document; a determining step of determining, for a search document from which text feature data and layout feature data have been acquired in said first and second acquisition steps, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document in said first acquisition step; a first narrow-down step of narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document in said first acquisition step if said determining step determined that the text feature data acquired from the search document is used; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document in said second acquisition step if said determining step determined that the layout feature data acquired from the search document is used; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired from the search document in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. 2. The method according to claim 1, wherein in said retrieving step, a similarity level between the search document and each document of the plurality of registered documents is obtained based on both a similarity of the text feature data and a similarity of the layout feature data, and the retrieved document is determined based on the obtained similarity levels. 3. The method according to claim 2, wherein in said retrieving step, the similarity level is obtained by adding weights to the similarity of the text feature data and the similarity of the layout feature data. 4. The method according to claim 1, wherein in said determining step, whether the text feature data or the layout feature data acquired from the search document is used for the narrowing-down process is determined based on analysis of the text feature data of the search document. 5. The method according to claim 4, wherein in said determining step, whether the text feature data or the layout feature data is used for the narrowing-down process is determined based on an amount of the text data acquired as the result of the character-recognition processing for the search document and a precision evaluation of the result of character recognition processing for the search document. 6. The method according to claim 1, wherein the layout feature data is acquired based on average brightness data and/or color data of n횞m rectangles into which the image data of the document is divided. 7. The method according to claim 1, wherein the layout feature data is acquired based on position and/or size of blocks which are obtained by block analysis of the image data of the document. 8. A storage medium storing a control program which causes a computer to execute the document retrieving method described in claim 1. 9. A computer-executable control program stored on a computer-readable medium which causes a computer to execute the document retrieving method according to claim 1. 10. A document retrieving apparatus for retrieving a document from a storage comprising: a first acquisition unit configured to acquire text data by executing character-recognition processing for image data of a document and to acquire text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit configured to acquire layout feature data based on the image data of the document; a storage unit configured to store the text feature data and the layout feature data respectively acquired from a registered document by said first and second acquisition units, in association with the registered document; a determining unit for determining, for a search document from which text feature data and layout feature data have been acquired by said first and second acquisition units, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document by said first acquisition unit; a first narrow-down unit for narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document by said first acquisition unit if said determining unit determined that the text feature data acquired from the search document is used; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document by said second acquisition unit if said determining unit determined that the layout feature data acquired from the search document is used; and a retrieving unit configured to retrieve a document, based on both the text feature data and the layout feature data acquired from the search document by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit. 11. The apparatus according to claim 10, wherein said retrieving unit obtains a similarity level between the search document and each document of the plurality of registered documents based on both a similarity of the text feature data and a similarity of the layout feature data, and determines the retrieved document based on the obtained similarity levels. 12. The apparatus according to claim 11, wherein said retrieving unit obtains the similarity level by adding weights to the similarity of the text feature data and the similarity of the layout feature data. 13. The apparatus according to claim 10, wherein said determining unit determines whether the text feature data or the layout feature data is used for the narrowing-down process based on analysis of the text feature data of the search document. 14. The apparatus according to claim 13, wherein said determining unit determines whether the text feature data or the layout feature data acquired from the search document is used for the narrowing-down process based on an amount of the text data acquired as the result of the character-recognition processing for the search document and a precision evaluation of the result of character recognition processing for the search document. 15. The apparatus according to claim 10, wherein the layout feature data is acquired based on average brightness data and/or color data of n횞m rectangles into which the image data of the document is divided. 16. The apparatus according to claim 10, wherein the layout feature data is acquired based on position and/or size of blocks which are obtained by block analysis of the image data of the document. 17. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the search document; a determining step of determining, based on the text feature data acquired from the search document in said first acquisition step, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down step of narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired in said first acquisition step if said determining step determined that the text feature data is used for the narrowing-down process; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired in said second acquisition step if said determining step determined that the layout feature data is used for the narrowing-down process; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. 18. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the search document; a determining step of determining, based on the layout feature data acquired from the search document in said second acquisition step, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down step of narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired in said first acquisition step if said determining step determined that the text feature data is used for the narrowing-down process; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired in said second acquisition step if said determining step determined that the layout feature data is used for the narrowing-down process; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. 19. A document retrieving apparatus for retrieving a document from a storage comprising: a first acquisition unit for acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit for acquiring layout feature data based on the image data of the search document; a determining unit for determining, based on the text feature data acquired from the search document by said second acquisition unit, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down unit for narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired by said first acquisition unit if said determining unit determined that the text feature data is used for the narrowing-down process; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired by said second acquisition unit if said determining unit determined that the layout feature data is used for the narrowing-down process; and a retrieving unit for retrieving a document, based on both the text feature data and the layout feature data acquired by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit. 20. A document retrieving apparatus for retrieving a document from a storage comprising: a first acquisition unit for acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit for acquiring layout feature data based on the image data of the search document; a determining unit for determining, based on the layout feature data acquired from the search document by said second acquisition unit, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down unit for narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired by said first acquisition unit if said determining unit determined that the text feature data is used for the narrowing-down process; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired by said second acquisition unit if said determining unit determined that the layout feature data is used for the narrowing-down process; and a retrieving unit for retrieving a document, based on both the text feature data and the layout feature data acquired by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (5)
Ota Junichi,JPX, Apparatus and method for filing, registering, and retrieving image files.
Mahoney James V. ; Blomberg Jeanette L. ; Trigg Randall H. ; Shin Christian K., System for searching a corpus of document images by user specified document layout components.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Determining actions involving captured information and electronic content associated with rendered documents.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.