IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0143279
(2002-05-10)
|
우선권정보 |
JP-P2001-140778(2001-05-10) |
발명자
/ 주소 |
- Kobayashi,Kenichiro
- Akabane,Makoto
- Nitta,Tomoaki
- Yamazaki,Nobuhide
- Kobayashi,Erika
|
출원인 / 주소 |
|
대리인 / 주소 |
|
인용정보 |
피인용 횟수 :
31 인용 특허 :
9 |
초록
▼
The text format of input data is checked, and is converted into a system-manipulated format. It is further determined if the input data is in an HTML or e-mail format using tags, heading information, and the like. The converted data is divided into blocks in a simple manner such that elements in the
The text format of input data is checked, and is converted into a system-manipulated format. It is further determined if the input data is in an HTML or e-mail format using tags, heading information, and the like. The converted data is divided into blocks in a simple manner such that elements in the blocks can be checked based on repetition of predetermined character patterns. Each block section is tagged with a tag indicating a block. The data divided into blocks is parsed based on tags, character patterns, etc., and is structured. A table in text is also parsed, and is segmented into cells. Finally, tree-structured data having a hierarchical structure is generated based on the sentence-structured data. A sentence-extraction template paired with the tree-structured data is used to extract sentences.
대표청구항
▼
What is claimed is: 1. A document processing apparatus comprising: block dividing means for dividing input document data into blocks in a predetermined manner according to a structure of the document data; document structuring means for structuring the document data, thereby generating structured d
What is claimed is: 1. A document processing apparatus comprising: block dividing means for dividing input document data into blocks in a predetermined manner according to a structure of the document data; document structuring means for structuring the document data, thereby generating structured data, by parsing a block into which the document data is divided by said block dividing means according to the document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data; and sentence extraction means for controlling an extraction of the text data according to the structured data and a predetermined condition, wherein the predetermined condition provides an indication of a method to be utilized to perform the extraction of the text data; wherein said document structuring means includes regular-expression determining means which refers to pattern information containing a two-dimensional regular expression for a two-dimensional character string and tag information associated with the regular expression, and which adds the tag information associated with the regular expression in pattern information to a character string in the block that matches the regular expression in the pattern information, before a sentence is extracted; and wherein the two-dimensional regular expression is expressed by two regular expressions, one regular expression indicating a head of a block using a one-dimensional regular expression, and another regular expression indicating a tail of the block using a one-dimensional regular expression, and by a number of lines which is permitted between the two regular expressions. 2. A document processing apparatus according to claim 1, further comprising regular-expression registering means for registering an arbitrary character string as the pattern information containing a two-dimensional regular expression and tag information associated with the two-dimensional regular expression which is used by said regular-expression determining means. 3. A document processing apparatus according to claim 1, wherein the two-dimensional regular expression is expressed by two regular expressions, one regular expression indicating a head of a block using a one-dimensional regular expression, and another regular expression indicating a tail of the block using a one-dimensional regular expression. 4. A document processing apparatus comprising: block dividing means for dividing input document data into blocks in a predetermined manner according to a structure of the document data; document structuring means for structuring the document data, thereby generating structured data, by parsing a block into which the document data is divided by said block dividing means according to the document structure of the block, and by adding tag information to text data constituting the block, said tag information indicating an attribute of the text data; and sentence extraction means for controlling an extraction of the text data according to the structured data and a predetermined condition, wherein the predetermined condition provides an indication of a method to be utilized to perform the extraction of the text data; wherein said document structuring means includes regular-expression determining means which refers to pattern information containing a two-dimensional regular expression for a two-dimensional character string and tag information associated with the regular expression, and which adds the tag information associated with the regular expression in pattern information to a character string in the block that matches the regular expression in the pattern information, before a sentence is extracted; and, wherein said sentence extraction means expresses the document data, which is structured according to the tag information generated by said document structuring means, as tree-structured data, and includes a sentence-extraction template which is paired with the tree-structured data in which each node is associated with an extraction control flag, and if the extraction control flag prohibits extraction of the text data, said sentence extraction means does not extract the text data flagged with the extraction control flag; template registering means for allowing a user to register an extraction control using the extraction control flag in the sentence-extraction template; and template search means for searching the sentence-extraction template registered by said template registering means using a fuzzy search by which a template that does not completely match a search condition meets the search condition, wherein the sentence-extraction template searched by said template search means is adapted to the tree-structured data.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.