[특허]Method and system for measuring the quality of a hierarchy

Method and system for measuring the quality of a hierarchy 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-017/00 G06N-005/02 G06N-005/00
출원번호	US-0066096 (2002-01-31)
발명자 / 주소	Forman,George H. Fawcett,Tom E. Suermondt,Henri J.
출원인 / 주소	Hewlett Packard Development Company, L.P.
인용정보	피인용 횟수 : 9 인용 특허 : 3

초록 ▼

Method and system for measuring the degree of coherence of nodes in a hierarchy. A hierarchy that includes a plurality of nodes is received. A plurality of training cases are also received. Based on these inputs, a measure of coherence is determined for at least one node D in the hierarchy. The determination of the measure of coherence includes evaluation of the training cases with at least one feature under a local environment of node D and evaluation of the training cases with at least one feature under a subtree of the node D.

대표청구항 ▼

The invention claimed is: 1. A computerized method for measuring the a degree of coherence of the arrangement of nodes as in a hierarchy comprising the steps of: a) receiving a predetermined hierarchy of nodes arranged in a tree format with one or more subtrees, wherein a subtree of one of the nodes includes any descendant nodes in the hierarchy that stem from the one node, the hierarchy of nodes includes at least a first node and a second node, and the nodes in the hierarchy are associated with one another as one of a sibling node, a child node, and a parent node, wherein the second node is a child node of the first node when the second node stems from the first node without any intervening node therebetween and the second node belongs in the subtree of the first node; wherein the first node is a parent node of the second node when the second node is the child node of the first node and belongs in the subtree of the first node; and wherein the first node is a sibling node with the second node when the first and second nodes stem from a same parent node without any intervening node therebetween, and the first node and the second node belong to the subtree of the same parent node; b) receiving a plurality of training cases that are filed under the nodes in the hierarchy; and c) responsive thereto for determining a coherence measure, for at least one of the nodes in the hierarchy that has a local environment, by evaluating the training cases filed in the subtree of the at least one node with respect to the training cases filed in the local environment of the at least one node; wherein the local environment of the at least one node includes any parent node of the at least one node, any nodes that are sibling nodes of the at least one node, and any nodes that belong to the subtrees under the sibling nodes of the at least one node. 2. The method of claim 1 wherein the step of determining a coherence measure includes the steps of determining, for the subtree of the at least one nod, the number of the training cases filed in the subtree and the average prevalence of each feature in the training cases filed in the subtree; determining, for the local environment of the at least one node, the number of the training cases filed in the local environment and the average prevalence of each feature in the training cases filed in the local environment; determining predictive features that distinguish the subtree of the at least one node from the local environment of the at least one node; and generating a coherence value for the at least one node based on the average prevalence of at least one predictive features. 3. The method of claim 2 further comprising the steps of determining, for each of the predictive features, a degree of uniformity of the prevalence of the each predictive feature among the children subtrees of the at least one node; and wherein the step of generating a coherence value for the at least one node is based on the degree of uniformity and the average prevalence of the at least one predictive feature. 4. The method of claim 1 wherein the hierarchy of nodes includes a topic hierarchy; wherein the nodes are topics; and wherein the training cases includes one of labeled documents and feature vectors assigned to the topics. 5. The method of claim 2 wherein the predictive features include at least one of words, multi-word phrases, noun phrases, document length, file extension type, and other parameters related to documents. 6. The method of claim 5, wherein the step of determining the predictive features includes the step of computing at least one of information-gain metrics, mutual-information metrics, ChiS-quared, Fisher's Exact Test, lift, odds-ratio, word frequency among documents, and word frequency among all words in all of the documents. 7. The method of claim 3 wherein the step of selecting features that are uniformly common includes the step of computing one of the metrics cosine-similarity, projection, and ChiSquared between the average feature prevalence vector and the vector of training case counts across subtopics of the at least one node, wherein the nodes in the hierarchy are topics and any child node of one of the nodes is also a subtopic of its parent node. 8. The method of claim 3 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by computing the average prevalence of the at least one predictive feature with the greatest degree of uniformity. 9. The method of claim 3 wherein the step of generating a the coherence value includes the step of generating a hierarchical coherence number by computing a weighted-average of the average prevalence of at least two of the predictive features that are selected as both predictive and uniform. 10. The method of claim 9 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by computing a weighted-average of the average prevalence of the top k most prevalent of the predictive features that are selected as both predictive and uniform, wherein k is a predetermined positive integer. 11. The method of claim 10 wherein the weighted-average employs as the weighting schedule one of the negative exponential function exp(-I) and the inverse rank function (1/I), where I is the ordered rank of the top k most prevalent of the predictive features that are selected as both predictive and uniform. 12. The method of claim 3 wherein the stop of generating a the coherence value includes the step of generating a hierarchical coherence number by computing an average value of the average prevalence of the top k most prevalent of the predictive features that are selected as both predictive and uniform, wherein k is a predetermined positive integer. 13. The method of claim 2 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by employing a maximum, over all of the predictive features, of a projection between the average feature prevalence vector and the vector of training case counts across subtopics of the at least one node, wherein the nodes in the hierarchy are topics and any child node of one of the nodes is also a subtopic of its parent node. 14. The method of claim 2 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by employing a maximum average prevalence of the predictive features. 15. The method of claim 1 further comprising the Step of: assigning an aggregate-coherence value to a node in the hierarchy, based on an aggregation function of said determined coherence value over the node and of descendants of the node. 16. The method of claim 15 wherein the aggregation function includes one of a sum, average, weighted-average, minimum function, and maximum function. 17. The method of claim 2 further comprising the step of: using the coherence values of one or more nodes in the hierarchy to modify the structure of the hierarchy to improve the coherence of the hierarchy. 18. The method of claim 2 further comprising the step of: using the coherence values of one or more nodes to guide the selection of training cases for an automated classifier. 19. The method of claim 2 further comprising the step of: using the coherence values of one or more nodes to select a suitable classification technology to be employed to automatically classify items in the hierarchy. 20. A computerized apparatus for measuring a degree of coherence of at least one considered node in a hierarchy of nodes that has associated therewith a subtree and a local environment in the hierarchy comprising: a) a training case counter for determining the number of training cases under the subtree and the number of training cases for the local environment, the subtree includes any nodes in the hierarchy that stem, from the at least one considered node, and to local environment includes any parent node from which the at least one node is stemmed directly and is thus a child node of the parent node, any sibling nodes that are stemmed directly from the parent node of the at least one node, and any nodes that stem from to sibling nodes of the at least one node; b) a predictive feature determination unit for determining a set of predictive features that distinguish training cases of the subtree from documents of the local environment; an average prevalence determination unit for determining for at least one feature the average prevalence under the subtree and the average prevalence for the local environment; and d) a coherence assignment unit for generating a coherence metric number for the at least one considered node based on at least one predictive feature. 21. The apparatus of claim 20 further comprising: a subtopic uniformity determination unit for determining the uniformity of the distribution of the predictive features among children subtopics of the at least one considered node, wherein the nodes in the hierarchy are topics and any child node of one of the nodes is also a subtopic of its parent node; wherein the coherence assignment unit generates a coherence metric number based on at least one predictive feature that is determined to be uniformly distributed among the children subtopics. 22. A computerized system for measuring the a degree of coherence of nodes in a topic hierarchy comprising: a) a coherence analyzer unit for receiving the topic hierarchy and a set of labeled training cases filed under each of the nodes in the topic hierarchy and responsive thereto for determining, for at least one current node under consideration from the nodes in the topic hierarchy a coherence measure in the topic hierarchy of the at least one current node under consideration by evaluating the training cases and at least one feature under a local environment of the at least one current node and by evaluating the training cases and at least one feature under a subtree of the at least one current node under consideration; wherein the subtree of the at least one current node under consideration includes any of the nodes in the topic hierarchy that stem from the at least one current node under consideration; and wherein the local environment of the at least one current node under consideration includes any of the nodes in the topic hierarchy that stem from a parent node from which the at least one current node under consideration is stemmed directly. 23. The system of claim 22 further comprising: b) a user interface presentation unit coupled to the coherence analyzer unit for displaying the coherence measure for one or more current nodes under consideration. 24. The system of claim 22 further comprising: b) a feature extractor coupled to the coherence analyzer for receiving a set of labeled documents and at least one feature guideline and responsive thereto for generating the set of labeled feature vectors. 25. The system of claim 22 wherein the coherence analyzer unit further comprises: a--1) a training case counter for determining the number of training cases under a subtree of each of the nodes; a--2) an average prevalence determination unit for determining the average prevalence for at least one feature under each of the node subtrees; a--3) a predictive feature determination unit fur determining predictive features under each of the node subtrees; and a--4) a coherence assignment unit for generating coherence metric number based on at least one of the predictive features. 26. The system of claim 25 wherein the coherence analyzer unit further comprises: a--5) a subtopic uniformity determination unit for determining the degree of uniformity in the distribution of one or more of the predictive features among the children of the at least on current node; Wherein the coherence assignment unit generates a coherence metric number based on at least one of the predictive features that is deemed uniform based on the determined degree of uniformity of the at least one uniformity. 27. A computerized method for measuring a degree of coherence for one or more nodes in a hierarchy of nodes comprising the steps of: a) receiving the hierarchy and the training cases filed into the hierarchy; b) determining a list of predictive features that distinguish documents of a subtree of a first one of the nodes in the hierarchy from documents in the first node's local environment, wherein the first node's subtree includes any nodes in the hierarchy that stem from the first node, and the first node's local environment includes any parent node from which the first node is stemmed directly, any sibling nodes that are stemmed directly from the first node's parent node, and any nodes that stem from the sibling nodes of the first node; c) assigning a coherence value to the first node-based on the list of predictive features and based on one or more of their degree of predictiveness, their degree of prevalence, and their degree of uniformity, wherein the degree of uniformity reflects how evenly distributed said predictive features are among the subtrees of the children nodes in the hierarchy that are directly stemmed from the first node based on the training cases under each of the subtrees of the children nodes.

이 특허에 인용된 특허 (3)

Gelvin, David C.; Girod, Lewis D.; Kaiser, William J.; Merrill, William M.; Newberg, Fredric; Pottie, Gregory J.; Sipos, Anton I.; Vardhan, Sandeep, Method and apparatus for internetworked wireless integrated network sensor (WINS) nodes.
상세보기
Lazarus Michael A. ; Caid William R. ; Pugh Richard S. ; Kindig Bradley D. ; Russell Gerald S. ; Brown Kenneth B. ; Dunning Ted E. ; Carleton Joel L., System and method for optimal adaptive matching of users to most relevant entity and information in real-time.
상세보기
Jochen Doerre DE; Peter Gerstl DE; Sebastian Goeser DE; Adrian Mueller DE; Roland Seiffert DE, Taxonomy generation for document collections.
상세보기

이 특허를 인용한 특허 (9)

Liu, Tie-Yan; Ma, Wei-Ying, Augmenting a training set for document categorization.
상세보기
Liu,Tie Yan; Ma,Wei Ying, Augmenting a training set for document categorization.
상세보기
Liesche, Stefan; Nauerz, Andreas; Schaeck, Jurgen, Dynamic context-sensitive integration of content into a web portal application.
상세보기
Kirshenbaum, Evan R.; Forman, George H., Feature selection based on partial ordered set of classifiers.
상세보기
Liu, Tie-Yan; Ma, Wei-Ying; Qin, Tao, Hierarchy-based propagation of contribution of documents.
상세보기
Ji, Rongrong; Xie, Xing, Incremental feature indexing for scalable location recognition.
상세보기
Corston,Simon H.; Chandrasekar,Raman; Chen,Harr, Machine-learned approach to determining document relevance for search over large electronic collections of documents.
상세보기
Scholz, Martin B.; Rajaram, Shyam Sundar; Lukose, Rajan, Method and system for characterizing web content.
상세보기
Skipper, Julie A.; Ganapathy, Priya, Methods and logic for autonomous generation of ensemble classifiers, and systems incorporating ensemble classifiers.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Method and system for measuring the quality of a hierarchy 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (9)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Method and system for measuring the quality of a hierarchy 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (9)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트