[특허]Document retrieving method and apparatus

Document retrieving method and apparatus 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-007/00 G06F-017/30
출원번호	US-0831150 (2004-04-26)
등록번호	US-7257567 (2007-08-14)
우선권정보	JP-2003-125812(2003-04-30)
발명자 / 주소	Toshima,Eiichiro
출원인 / 주소	Canon Kabushiki Kaisha
대리인 / 주소	Fitzpatrick, Cella, Harper & Scinto
인용정보	피인용 횟수 : 32 인용 특허 : 5

초록 ▼

In the proposed document retrieving apparatus, text feature data that bases upon text data included in a document and image feature data that bases upon a document image are stored in a memory. Image data of a search document is subjected to character recognition processing, text feature data is acquired based on the obtained text data, and image feature data (layout data) is acquired based on the image data of the search document. Using the text feature data and image feature data acquired with respect to the search document, a memory is searched, and a document corresponding to the search document is retrieved from plural documents.

대표청구항 ▼

What is claimed is: 1. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the document; a storing step of storing, in storage means, text feature data and layout feature data respectively acquired from a registered document in said first and second acquisition steps, in association with the registered document; a determining step of determining, for a search document from which text feature data and layout feature data have been acquired in said first and second acquisition steps, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document in said first acquisition step; a first narrow-down step of narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document in said first acquisition step if said determining step determined that the text feature data acquired from the search document is used; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document in said second acquisition step if said determining step determined that the layout feature data acquired from the search document is used; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired from the search document in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. 2. The method according to claim 1, wherein in said retrieving step, a similarity level between the search document and each document of the plurality of registered documents is obtained based on both a similarity of the text feature data and a similarity of the layout feature data, and the retrieved document is determined based on the obtained similarity levels. 3. The method according to claim 2, wherein in said retrieving step, the similarity level is obtained by adding weights to the similarity of the text feature data and the similarity of the layout feature data. 4. The method according to claim 1, wherein in said determining step, whether the text feature data or the layout feature data acquired from the search document is used for the narrowing-down process is determined based on analysis of the text feature data of the search document. 5. The method according to claim 4, wherein in said determining step, whether the text feature data or the layout feature data is used for the narrowing-down process is determined based on an amount of the text data acquired as the result of the character-recognition processing for the search document and a precision evaluation of the result of character recognition processing for the search document. 6. The method according to claim 1, wherein the layout feature data is acquired based on average brightness data and/or color data of n횞m rectangles into which the image data of the document is divided. 7. The method according to claim 1, wherein the layout feature data is acquired based on position and/or size of blocks which are obtained by block analysis of the image data of the document. 8. A storage medium storing a control program which causes a computer to execute the document retrieving method described in claim 1. 9. A computer-executable control program stored on a computer-readable medium which causes a computer to execute the document retrieving method according to claim 1. 10. A document retrieving apparatus for retrieving a document from a storage comprising: a first acquisition unit configured to acquire text data by executing character-recognition processing for image data of a document and to acquire text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit configured to acquire layout feature data based on the image data of the document; a storage unit configured to store the text feature data and the layout feature data respectively acquired from a registered document by said first and second acquisition units, in association with the registered document; a determining unit for determining, for a search document from which text feature data and layout feature data have been acquired by said first and second acquisition units, whether the text feature data acquired from the search document or the layout feature data acquired from the search document is used for a narrowing-down process, based on the text feature data acquired from the search document by said first acquisition unit; a first narrow-down unit for narrowing down a plurality of registered documents stored in the storage means based on the text feature data acquired from the search document by said first acquisition unit if said determining unit determined that the text feature data acquired from the search document is used; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired from the search document by said second acquisition unit if said determining unit determined that the layout feature data acquired from the search document is used; and a retrieving unit configured to retrieve a document, based on both the text feature data and the layout feature data acquired from the search document by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit. 11. The apparatus according to claim 10, wherein said retrieving unit obtains a similarity level between the search document and each document of the plurality of registered documents based on both a similarity of the text feature data and a similarity of the layout feature data, and determines the retrieved document based on the obtained similarity levels. 12. The apparatus according to claim 11, wherein said retrieving unit obtains the similarity level by adding weights to the similarity of the text feature data and the similarity of the layout feature data. 13. The apparatus according to claim 10, wherein said determining unit determines whether the text feature data or the layout feature data is used for the narrowing-down process based on analysis of the text feature data of the search document. 14. The apparatus according to claim 13, wherein said determining unit determines whether the text feature data or the layout feature data acquired from the search document is used for the narrowing-down process based on an amount of the text data acquired as the result of the character-recognition processing for the search document and a precision evaluation of the result of character recognition processing for the search document. 15. The apparatus according to claim 10, wherein the layout feature data is acquired based on average brightness data and/or color data of n횞m rectangles into which the image data of the document is divided. 16. The apparatus according to claim 10, wherein the layout feature data is acquired based on position and/or size of blocks which are obtained by block analysis of the image data of the document. 17. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the search document; a determining step of determining, based on the text feature data acquired from the search document in said first acquisition step, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down step of narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired in said first acquisition step if said determining step determined that the text feature data is used for the narrowing-down process; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired in said second acquisition step if said determining step determined that the layout feature data is used for the narrowing-down process; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. 18. A document retrieving method for retrieving a document from a storage by using an information processing apparatus comprising: a first acquisition step of acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition step of acquiring layout feature data based on the image data of the search document; a determining step of determining, based on the layout feature data acquired from the search document in said second acquisition step, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down step of narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired in said first acquisition step if said determining step determined that the text feature data is used for the narrowing-down process; a second narrow-down step of narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired in said second acquisition step if said determining step determined that the layout feature data is used for the narrowing-down process; and a retrieving step of retrieving a document, based on both the text feature data and the layout feature data acquired in said first and second acquisition steps, from the registered documents narrowed-down in said first narrow-down step or said second narrow-down step. 19. A document retrieving apparatus for retrieving a document from a storage comprising: a first acquisition unit for acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit for acquiring layout feature data based on the image data of the search document; a determining unit for determining, based on the text feature data acquired from the search document by said second acquisition unit, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down unit for narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired by said first acquisition unit if said determining unit determined that the text feature data is used for the narrowing-down process; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired by said second acquisition unit if said determining unit determined that the layout feature data is used for the narrowing-down process; and a retrieving unit for retrieving a document, based on both the text feature data and the layout feature data acquired by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit. 20. A document retrieving apparatus for retrieving a document from a storage comprising: a first acquisition unit for acquiring text data by executing character-recognition processing for image data of a search document and acquiring text feature data based on the text data acquired as a result of the character-recognition processing; a second acquisition unit for acquiring layout feature data based on the image data of the search document; a determining unit for determining, based on the layout feature data acquired from the search document by said second acquisition unit, whether the text feature data or the layout feature data is used for a narrowing-down process; a first narrow-down unit for narrowing down a plurality of registered documents stored in a storage means based on the text feature data acquired by said first acquisition unit if said determining unit determined that the text feature data is used for the narrowing-down process; a second narrow-down unit for narrowing down the plurality of registered documents stored in the storage means based on the layout feature data acquired by said second acquisition unit if said determining unit determined that the layout feature data is used for the narrowing-down process; and a retrieving unit for retrieving a document, based on both the text feature data and the layout feature data acquired by said first and second acquisition units, from the registered documents narrowed-down by said first narrow-down unit or said second narrow-down unit.

이 특허에 인용된 특허 (5)

Ota Junichi,JPX, Apparatus and method for filing, registering, and retrieving image files.
상세보기
Hirotaka Shiiyama JP, Image retrieval apparatus and method.
상세보기
Balabanovic, Marko; Lee, Dar-Shyang, Method and an apparatus for visual summarization of documents.
상세보기
Bobrow, Daniel G.; Mahoney, James V.; Rucklidge, William J., Sorting image segments into clusters based on a distance measurement.
상세보기
Mahoney James V. ; Blomberg Jeanette L. ; Trigg Randall H. ; Shin Christian K., System for searching a corpus of document images by user specified document layout components.
상세보기

이 특허를 인용한 특허 (32)

Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Adaptive pattern recognition based controller apparatus and method and human-interface therefore.
상세보기
King, Martin; Grover, Dale; Kushler, Clifford; Stafford-Fraser, James; Mannby, Claes-Fredrik, Archive of text captures from rendered documents.
상세보기
Parikh, Prashant, Automatic dynamic contextual data entry completion.
상세보기
Parikh, Prashant, Automatic dynamic contextual data entry completion system.
상세보기
Parikh, Prashant, Automatic dynamic contextual data entry completion system.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Automatic modification of web pages.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Automatically capturing information, such as capturing information using a document-aware device.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J.; Daley-Watson, Christopher J., Automatically providing content associated with captured information, such as information captured in real-time.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Capturing text from rendered documents using supplement information.
상세보기
King, Martin Towle; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Data capture from rendered documents using handheld device.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Determining actions involving captured information and electronic content associated with rendered documents.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Identifying a document by performing spectral analysis on the contents of the document.
상세보기
Morohoshi, Hiroshi, Image processing apparatus, and computer program product.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Smith, Michael J., Image search using text-based elements within the contents of images.
상세보기
Hoffberg, Steven M.; Hoffberg-Borghesani, Linda I., Internet appliance system and method.
상세보기
Smiling, Eric J.; Rodland, Andrew, Lexicon based systems and methods for intelligent media search.
상세보기
Smiling, Eric J.; Rodland, Andrew, Lexicon based systems and methods for intelligent media search.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Method and system for character recognition.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Quentin, Method and system for character recognition.
상세보기
Smiling, Eric J., Mosaic display systems and methods for intelligent media search.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Arends, Thomas C.; Bajorins, David P.; Fox, Daniel C., Optical scanners, such as hand-held optical scanners.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
상세보기
King, Martin T.; Stephens, Redwood; Mannby, Claes-Fredrik; Peterson, Jesse; Sanvitale, Mark; Smith, Michael J., Performing actions based on capturing information from rendered documents, such as documents under copyright.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Portable scanning device.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Processing techniques for text capture from a rendered document.
상세보기
King, Martin T.; Kushler, Clifford A.; Stafford-Fraser, James Q.; Grover, Dale L., Processing techniques for visual capture data from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Search engines and systems with handheld document data capture devices.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Grover, Dale L.; Kushler, Clifford A.; Stafford-Fraser, James Q., Triggering actions in response to optically or acoustically capturing keywords from a rendered document.
상세보기
King, Martin T.; Mannby, Claes-Fredrik; Valenti, William, Using gestalt information to identify locations in printed information.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Document retrieving method and apparatus 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (5)

이 특허를 인용한 특허 (32)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Document retrieving method and apparatus 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (5)

이 특허를 인용한 특허 (32)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트