Virtual tags and the process of virtual tagging utilizing user feedback in transformation rules
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-015/00
G06F-017/00
출원번호
US-0750505
(2000-12-28)
발명자
/ 주소
Imielinski,Tomasz
Sgro,Vince
Smith,Don
출원인 / 주소
Rutgers, The State University of New Jersey
대리인 / 주소
Mathews, Shepherd, McKay &
인용정보
피인용 횟수 :
25인용 특허 :
13
초록▼
The present invention relates to a method and system for transformation of an electronic document through learning transformation rules during training from the original electronic document using visual user feedback and applying the learned transformation rules to either the original electronic doc
The present invention relates to a method and system for transformation of an electronic document through learning transformation rules during training from the original electronic document using visual user feedback and applying the learned transformation rules to either the original electronic document or a second electronic document having a similar structure as the original document or all future instances of the original electronic document. Accordingly, the transformed document is customized to the user's preference learned during training. Preferably, the transformed document is created in a queriable form. For example, the original electronic document can be defined any type of mark-up language or electronic document generation language, such as Hypertext mark-up language (HTML), extended mark-up language (XML), portable data file (PDF) or Microsoft짰 Word, and the like and the transformed document is defined in a queriable language such as (XML) views and the like. For example, a virtual page can be a customization of an instance of a Web page which can be used to transform all future instances of the original Web page. Alternatively, the virtual page is formed form a customization of an original electronic document, such as a chapter in a book, which is applied to a second electronic document having a similar structure, such as all chapters in the book.
대표청구항▼
What is claimed is: 1. A method for transforming and electronic document comprising the steps of: providing a visual representation of an original electronic document to a user; receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one o
What is claimed is: 1. A method for transforming and electronic document comprising the steps of: providing a visual representation of an original electronic document to a user; receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said one or more virtual tags and said one or more transformation rules are determined by the steps of: a. selecting one or more document elements for inclusion or exclusion in said virtual page from said visual representation of said original electronic document using a graphical user interface; b. identifying said selected document elements using features of a personal data content mining (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual page; c. collecting said identified document elements into a set; and d. applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags and generate said one or more transformation rules. 2. The method of claim 1 wherein after said step d. of applying a classification algorithm further comprising the steps of: e. indicating said one or more virtual tags to said user at said visual representation; and f. approving said indicated one or more virtual tags; or g. disapproving said indicated one or more virtual tags, wherein when said indicated one or more virtual tags are disapproved repeating step a. through step e. 3. The method of claim 1 wherein said original electronic document is an original Web page said PDCM feature set comprises element description space features. 4. The method of claim 3 wherein said element description space features comprise one or more of the following features: bold, not bold, italic, not italic, underline, not underline, superscript, subscript, normal type, number of links encountered before which document element within a current nested structure, size of a font, foreground color, background color, font face, surrounding header level, immediately preceding header level, immediately preceding comment text, table body, header, footer, caption, not a caption, cascading style sheet class, beginning of the current nested structure, amount of preceding visual space, pattern of preceding visual breaks, number of preceding visual breaks, path through a nested structure of said original Web page, table row at a document structure depth, table column at a document structure depth, and item count at a document structure depth. 5. The method of claim 1 wherein said original electronic document is an original Web page said PDCM feature set comprises path feature space features. 6. The method of claim 5 wherein said path feature space features comprise one or more of the following features: a sequence, number of line breaks in a sequence, number of table cells in one row in a sequence, number of table cells in one column in a sequence, relativized feature space attributes, and number of preceding visual breaks of an item list number at a document structure depth. 7. The method of claim 1 further comprising the steps of: determining stability of each of said features of said PDCM feature set; and selecting said features of said PDCM feature set having a highest stability in said step d. of applying said classification algorithm. 8. A method for transforming and electronic document comprising the steps of: providing a visual representation of an original electronic document to a user; receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said original electronic document is an original Web page said one or more virtual tags and said one or more transformation rules are determined by: determining structural relationships of said original Web page to form a tree structure; selecting one or more structural objects from said visual presentation of said original Web page; selecting one or more document elements for inclusion or exclusion in said virtual Web page from said visual representation of said original Web page using a graphical user interface; identifying said selected document elements using features of personal data content mining a (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual Web page; collecting said identified document elements into a set; applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags as one or more first virtual tags; determining one or more second virtual tags from said feedback and said one or more structural objects; associating said one or more second virtual tags to said tree structure; and applying learning to associate said one or more first virtual tags to said one or more second virtual tags and to generate said one or more transformation rules. 9. A method for transforming and electronic document comprising the steps of: providing a visual representation of an original electronic document to a user; receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said one or more virtual tags and said one or more transformation rules are determined by the steps of: a. selecting one or more document elements for inclusion or exclusion in said virtual page from said visual representation of said original electronic document using a graphical user interface; b. identifying said selected document elements using features of a personal data content mining (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual page; c. collecting said identified document elements into a set; and d. applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags and generate said one or more transformation rules wherein said one or more transformation rules are applied to a more recent version of said original Web page. 10. A method for transforming and a dynamically changing electronic document comprising the steps of: providing a visual representation of an original one or more instances of a dynamically changing electronic document to a user; receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said one or more instances of said electronic document; constructing one or more transformation rules using said feedback, and said one or more virtual tags extraction rules defining transformation of said electronic document; and applying said one or more extraction transformation rules to said one or more instances of said electronic document, a second electronic document having a similar structure as said one or more instances of said document or future instances versions of said original electronic document to generate a virtual page of customized content extracted from said one or more instances of said electronic document, said second electronic document having a similar structure as said original document or said future versions of said electronic document; and providing a visual representation of said virtual page wherein said one or more virtual tags are generated by the steps of: categorizing all elements of said one or more instances of said electronic document as a plurality of OLAP cubes; determining assignment of said feedback to said OLAP cubes; and browsing said cubes to create said one or more virtual tags. 11. A system for transforming an electronic document comprising: means for providing a visual representation of an original electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; means for constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and means for applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said means for constructing one or more transformation rules comprises: means for selecting one or more document elements for inclusion or exclusion in said virtual page from said visual representation of said original electronic document using a graphical user interface; means for identifying said selected document elements using features of a personal data content mining (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual page; means for collecting said identified document elements into a set; and means for applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags and generate said one or more transformation rules. 12. The system of claim 11 further comprising: means for indicating said one or more virtual tags to said user at said visual representation; and means for approving said indicated one or more virtual tags; or means for disapproving said indicated one or more virtual tags, wherein when said indicated one or more virtual tags are disapproved determining revised one or more virtual tags and applying said determined revised one or more virtual tags to said means for constructing one or more transformation rules. 13. The system of claim 11 wherein said electronic document is a Web page PDCM feature set comprises element description space features. 14. The system of claim 13 wherein said element description space features comprise one or more of the following features: bold, not bold, italic, not italic, underline, not underline, superscript, subscript, normal type, number of links encountered before which document element within a current nested structure, size of a font, foreground color, background color, font face, surrounding header level, immediately preceding header level, immediately preceding comment text, table body, header, footer, caption, not a caption, cascading style sheet class, beginning of the current nested structure, amount of preceding visual space, pattern of preceding visual breaks, number of preceding visual breaks, path through a nested structure of said original Web page, table row at a document structure depth, table column at a document structure depth, and item count at a document structure depth. 15. The system of claim 11 wherein said original electronic document is an original Web page said PDCM feature set comprises path feature space features. 16. The system of claim 15 wherein said path feature space features comprise one or more of the following features: a sequence, number of line breaks in a sequence, number of table cells in one row in a sequence, number of table cells in one column in a sequence, relativized feature space attributes, and number of preceding visual breaks of an item list number at a document structure depth. 17. The system of claim 11 further comprising: means for determining stability of each of said features of said PDCM feature set; and means for selecting said features of said PDCM feature set having a highest stability in said means for applying said classification algorithm. 18. A system for transforming an electronic document comprising: means for providing a visual representation of an original electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; means for constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and means for applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said original electronic document is an original Web page said one or more virtual tags and said one or more transformation rules are determined by: means for determining structural relationships of said original Web page to form a tree structure; means for selecting one or more structural objects from said visual presentation of said original Web page; means for selecting one or more document elements for inclusion or exclusion in said virtual Web page from said visual representation of said original Web page using a graphical user interface; means for identifying said selected document elements using features of personal data content mining a (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual Web page; means for collecting said identified document elements into a set; means for applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags as one or more first virtual tags; means for determining one or more second virtual tags from said feedback and said one or more structural objects; means for associating said one or more second virtual tags to said tree structure; and means for applying learning to associate said one or more first virtual tags to said one or more second virtual tags and to generate said one or more transformation rules. 19. A system for transforming an electronic document comprising: means for providing a visual representation of an original electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; means for constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and means for applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said means for constructing one or more transformation, a first said one or more virtual tags is a portion of said original electronic document to be cut and a second one of said one or more virtual tags is a portion of said electronic document to be pasted and said one or more transformation rules being constructed from said first virtual tag and said second virtual tag for determining a cut and paste operation, wherein said one or more transformation rules are applied to a more recent version of said original Web page. 20. A system for transforming an a dynamically changing electronic document comprising: means for providing a visual representation of an original one or more instances of a dynamically changing electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said one or more instances of said electronic document; means for constructing one or more transformation rules using said feedback, and said one or more transformation rules defining transformation of said electronic document virtual tags; and means for applying said one or more extraction transformation rules to said one or more instances of said electronic document, a second electronic document having a similar structure as said one or more instances of said electronic document or future instances versions of said original electronic document to generate a virtual page of customized content extracted from said one or more instances of said electronic document, said second electronic document having a similar structure as said original document or said future versions of said one or more instances of said electronic document wherein said one or more virtual tags are generated by: means for categorizing all elements of said one or more instances of said electronic document as a plurality of OLAP cubes; means for determining assignment of said feedback to said OLAP cubes; and means for browsing said OLAP cubes to create said one or more virtual tags. 21. A computer program product for transforming an electronic document comprising: means for providing a visual representation of an original electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; means for constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and means for applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said means for constructing one or more transformation, wherein said means for constructing one or more transformation rules comprises: means for selecting one or more document elements for inclusion or exclusion in said virtual page from said visual representation of said original electronic document using a graphical user interface; means for identifying said selected document elements using features of a personal data content mining (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual page; means for collecting said identified document elements into a set; and means for applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags and generate said one or more transformation rules. 22. The computer program product of claim 21 further comprising: means for indicating said one or more virtual tags to said user at said visual representation; and means for approving said indicated one or more virtual tags; or means for disapproving said indicated one or more virtual tags, wherein when said indicated one or more virtual tags are disapproved determining revised one or more virtual tags and applying said means for indicated said one or more virtual tags to said user at said visual representation and means for approving said indicated virtual tags. 23. The computer program product of claim 21 wherein said original electronic document is an original Web page said PDCM feature set comprises element description space features. 24. The computer program product of claim 23 wherein said element description space features comprise one or more of the following features: bold, not bold, italic, not italic, underline, not underline, superscript, subscript, normal type, number of links encountered before which document element within a current nested structure, size of a font, foreground color, background color, font face, surrounding header level, immediately preceding header level, immediately preceding comment text, table body, header, footer, caption, not a caption, cascading style sheet class, beginning of the current nested structure, amount of preceding visual space, pattern of preceding visual breaks, number of preceding visual breaks, path through a nested structure of said original Web page, table row at a document structure depth, table column at a document structure depth, and item count at a document structure depth. 25. The computer program product of claim 21 wherein said original electronic document is an original Web page said PDCM feature set comprises path feature space features. 26. The computer program product of claim 25 wherein said path feature space features comprise one or more of the following features: a sequence, number of line breaks in a sequence, number of table cells in one row in a sequence, number of table cells in one column in a sequence, relativized feature space attributes, and number of preceding visual breaks of an item list number at a document structure depth. 27. The computer program product of claim 21 further comprising: means for determining stability of each of said features of said PDCM feature set; and means for selecting said features of said PDCM feature set having a highest stability in said means for applying said classification algorithm. 28. A computer program product for transforming an electronic document comprising: means for providing a visual representation of an original electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; means for constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and means for applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content wherein said means for constructing one or more transformation, wherein said original electronic document is an original Web page said one or more virtual tags and said one or more transformation rules are determined by: means for determining structural relationships of said original Web page to form a tree structure; means for selecting one or more structural objects from said visual presentation of said original Web page; means for selecting one or more document elements for inclusion or exclusion in said virtual Web page from said visual representation of said original Web page using a graphical user interface; means for identifying said selected document elements using features of personal data content mining a (PDCM) feature set and an intent of said user to include or exclude said document element in said virtual Web page; means for collecting said identified document elements into a set; means for applying a classification algorithm to said set to classify said one or more document elements into a respective said one or more virtual tags as one or more first virtual tags; means for determining one or more second virtual tags from said feedback and said one or more structural objects; means for associating said one or more second virtual tags to said tree structure; and means for applying learning to associate said one or more first virtual tags to said one or more second virtual tags and to generate said one or more transformation rules. 29. A computer program product for transforming an electronic document comprising: means for providing a visual representation of an original electronic document to a user; means for receiving feedback from interaction by said user with said visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said original electronic document; means for constructing one or more transformation rules using said feedback, said one or more transformation rules defining transformation of said electronic document; and means for applying said one or more transformation rules to said electronic document, a second electronic document or future instances of said original document to generate a virtual page of customized content; a first said one or more virtual tags is a portion of said original electronic document to be cut and a second one of said one or more virtual tags is a portion of said electronic document to be pasted and said one or more transformation rules being constructed from said first virtual tag and said second virtual tag for determining a cut and paste operation; wherein said one or more transformation rules are applied to a more recent version of said original Web page. 30. A computer program product for transforming a dynamically changing electronic document comprising: means for providing a visual representation of an one or more instances of a dynamically changing electronic document to a user; means for receiving feedback from interaction by the user with the visual representation, said feedback is used to generate one or more virtual tags, said virtual tags identifying features of a portion of said one or more instances of said electronic document; means for constructing one or more transformation rules using said feedback and said one or more virtual tags; and means for applying said one or more transformation rules to said one or more instances of said electronic document, a second electronic document having a similar structure as said one or more instances of said electronic document or future versions of said electronic document to generate a virtual page of customized content extracted from said one or more instances of said electronic document, said second electronic document having a similar structure as said one or more instances of said electronic document or said future versions of said electronic document; means for storing said one or more virtual tags with said one or more transformation rules as a respective one or more virtual tag objects in a virtual repository; and means for retrieving said one or more stored virtual tag objects from said virtual repository when subsequently accessing said electronic document, said stored one or more transformation rules being used to generate said virtual page wherein said one or more virtual tags are generated by: means for categorizing all elements of said one or more instances of said electronic document as a plurality of OLAP cubes; means for determining assignment of said feedback to said OLAP cubes; and means for browsing said OLAP cubes to create said one or more virtual tags.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (13)
Bernardo Richard S. ; Logan Christopher ; Karra Elena, Automated system and method for approving web site content.
Burkes Daniel F. ; Jones William H. ; Kish John W. ; Moody Paul B. ; Royal Eliza H., Method and apparatus for creating and organizing a document from a plurality of local or external documents represented as objects in a hierarchical tree.
Mighdoll Lee S. ; Leak Bruce A. ; Perlman Stephen G. ; Goldman Phillip Y., Method of transcoding documents in a network environment using a proxy server.
Ali, Zohaib Haider; Meyers, Jr., David Lloyd; Yan, Jun; Thomas, Craig Edward; Manda, Srinivasa Reddy; Manning Dawson, Sara Louise; Clement, Kevin C.; Carpineti, Samuele; Goel, Ankit, Customizing program features on a per-user basis.
Chu, Danae Candace; Gandhi, Shruti; Garbow, Zachary Adam; Liang, Clara Chia-Yen; Trifilo, Timothy M., Framework for persistent user interactions within web-pages.
Boyan, Justin; McDonald, Glenn; Benthall, Margaret; Molnar, Ray, Methods and systems to train models to extract and integrate information from data sources.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.