Method and system for measuring the quality of a hierarchy
원문보기
IPC분류정보
국가/구분
United States(US) Patent
등록
국제특허분류(IPC7판)
G06F-017/00
G06N-005/02
G06N-005/00
출원번호
US-0066096
(2002-01-31)
발명자
/ 주소
Forman,George H.
Fawcett,Tom E.
Suermondt,Henri J.
출원인 / 주소
Hewlett Packard Development Company, L.P.
인용정보
피인용 횟수 :
9인용 특허 :
3
초록▼
Method and system for measuring the degree of coherence of nodes in a hierarchy. A hierarchy that includes a plurality of nodes is received. A plurality of training cases are also received. Based on these inputs, a measure of coherence is determined for at least one node D in the hierarchy. The dete
Method and system for measuring the degree of coherence of nodes in a hierarchy. A hierarchy that includes a plurality of nodes is received. A plurality of training cases are also received. Based on these inputs, a measure of coherence is determined for at least one node D in the hierarchy. The determination of the measure of coherence includes evaluation of the training cases with at least one feature under a local environment of node D and evaluation of the training cases with at least one feature under a subtree of the node D.
대표청구항▼
The invention claimed is: 1. A computerized method for measuring the a degree of coherence of the arrangement of nodes as in a hierarchy comprising the steps of: a) receiving a predetermined hierarchy of nodes arranged in a tree format with one or more subtrees, wherein a subtree of one of the node
The invention claimed is: 1. A computerized method for measuring the a degree of coherence of the arrangement of nodes as in a hierarchy comprising the steps of: a) receiving a predetermined hierarchy of nodes arranged in a tree format with one or more subtrees, wherein a subtree of one of the nodes includes any descendant nodes in the hierarchy that stem from the one node, the hierarchy of nodes includes at least a first node and a second node, and the nodes in the hierarchy are associated with one another as one of a sibling node, a child node, and a parent node, wherein the second node is a child node of the first node when the second node stems from the first node without any intervening node therebetween and the second node belongs in the subtree of the first node; wherein the first node is a parent node of the second node when the second node is the child node of the first node and belongs in the subtree of the first node; and wherein the first node is a sibling node with the second node when the first and second nodes stem from a same parent node without any intervening node therebetween, and the first node and the second node belong to the subtree of the same parent node; b) receiving a plurality of training cases that are filed under the nodes in the hierarchy; and c) responsive thereto for determining a coherence measure, for at least one of the nodes in the hierarchy that has a local environment, by evaluating the training cases filed in the subtree of the at least one node with respect to the training cases filed in the local environment of the at least one node; wherein the local environment of the at least one node includes any parent node of the at least one node, any nodes that are sibling nodes of the at least one node, and any nodes that belong to the subtrees under the sibling nodes of the at least one node. 2. The method of claim 1 wherein the step of determining a coherence measure includes the steps of determining, for the subtree of the at least one nod, the number of the training cases filed in the subtree and the average prevalence of each feature in the training cases filed in the subtree; determining, for the local environment of the at least one node, the number of the training cases filed in the local environment and the average prevalence of each feature in the training cases filed in the local environment; determining predictive features that distinguish the subtree of the at least one node from the local environment of the at least one node; and generating a coherence value for the at least one node based on the average prevalence of at least one predictive features. 3. The method of claim 2 further comprising the steps of determining, for each of the predictive features, a degree of uniformity of the prevalence of the each predictive feature among the children subtrees of the at least one node; and wherein the step of generating a coherence value for the at least one node is based on the degree of uniformity and the average prevalence of the at least one predictive feature. 4. The method of claim 1 wherein the hierarchy of nodes includes a topic hierarchy; wherein the nodes are topics; and wherein the training cases includes one of labeled documents and feature vectors assigned to the topics. 5. The method of claim 2 wherein the predictive features include at least one of words, multi-word phrases, noun phrases, document length, file extension type, and other parameters related to documents. 6. The method of claim 5, wherein the step of determining the predictive features includes the step of computing at least one of information-gain metrics, mutual-information metrics, ChiS-quared, Fisher's Exact Test, lift, odds-ratio, word frequency among documents, and word frequency among all words in all of the documents. 7. The method of claim 3 wherein the step of selecting features that are uniformly common includes the step of computing one of the metrics cosine-similarity, projection, and ChiSquared between the average feature prevalence vector and the vector of training case counts across subtopics of the at least one node, wherein the nodes in the hierarchy are topics and any child node of one of the nodes is also a subtopic of its parent node. 8. The method of claim 3 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by computing the average prevalence of the at least one predictive feature with the greatest degree of uniformity. 9. The method of claim 3 wherein the step of generating a the coherence value includes the step of generating a hierarchical coherence number by computing a weighted-average of the average prevalence of at least two of the predictive features that are selected as both predictive and uniform. 10. The method of claim 9 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by computing a weighted-average of the average prevalence of the top k most prevalent of the predictive features that are selected as both predictive and uniform, wherein k is a predetermined positive integer. 11. The method of claim 10 wherein the weighted-average employs as the weighting schedule one of the negative exponential function exp(-I) and the inverse rank function (1/I), where I is the ordered rank of the top k most prevalent of the predictive features that are selected as both predictive and uniform. 12. The method of claim 3 wherein the stop of generating a the coherence value includes the step of generating a hierarchical coherence number by computing an average value of the average prevalence of the top k most prevalent of the predictive features that are selected as both predictive and uniform, wherein k is a predetermined positive integer. 13. The method of claim 2 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by employing a maximum, over all of the predictive features, of a projection between the average feature prevalence vector and the vector of training case counts across subtopics of the at least one node, wherein the nodes in the hierarchy are topics and any child node of one of the nodes is also a subtopic of its parent node. 14. The method of claim 2 wherein the step of generating the coherence value includes the step of generating a hierarchical coherence number by employing a maximum average prevalence of the predictive features. 15. The method of claim 1 further comprising the Step of: assigning an aggregate-coherence value to a node in the hierarchy, based on an aggregation function of said determined coherence value over the node and of descendants of the node. 16. The method of claim 15 wherein the aggregation function includes one of a sum, average, weighted-average, minimum function, and maximum function. 17. The method of claim 2 further comprising the step of: using the coherence values of one or more nodes in the hierarchy to modify the structure of the hierarchy to improve the coherence of the hierarchy. 18. The method of claim 2 further comprising the step of: using the coherence values of one or more nodes to guide the selection of training cases for an automated classifier. 19. The method of claim 2 further comprising the step of: using the coherence values of one or more nodes to select a suitable classification technology to be employed to automatically classify items in the hierarchy. 20. A computerized apparatus for measuring a degree of coherence of at least one considered node in a hierarchy of nodes that has associated therewith a subtree and a local environment in the hierarchy comprising: a) a training case counter for determining the number of training cases under the subtree and the number of training cases for the local environment, the subtree includes any nodes in the hierarchy that stem, from the at least one considered node, and to local environment includes any parent node from which the at least one node is stemmed directly and is thus a child node of the parent node, any sibling nodes that are stemmed directly from the parent node of the at least one node, and any nodes that stem from to sibling nodes of the at least one node; b) a predictive feature determination unit for determining a set of predictive features that distinguish training cases of the subtree from documents of the local environment; an average prevalence determination unit for determining for at least one feature the average prevalence under the subtree and the average prevalence for the local environment; and d) a coherence assignment unit for generating a coherence metric number for the at least one considered node based on at least one predictive feature. 21. The apparatus of claim 20 further comprising: a subtopic uniformity determination unit for determining the uniformity of the distribution of the predictive features among children subtopics of the at least one considered node, wherein the nodes in the hierarchy are topics and any child node of one of the nodes is also a subtopic of its parent node; wherein the coherence assignment unit generates a coherence metric number based on at least one predictive feature that is determined to be uniformly distributed among the children subtopics. 22. A computerized system for measuring the a degree of coherence of nodes in a topic hierarchy comprising: a) a coherence analyzer unit for receiving the topic hierarchy and a set of labeled training cases filed under each of the nodes in the topic hierarchy and responsive thereto for determining, for at least one current node under consideration from the nodes in the topic hierarchy a coherence measure in the topic hierarchy of the at least one current node under consideration by evaluating the training cases and at least one feature under a local environment of the at least one current node and by evaluating the training cases and at least one feature under a subtree of the at least one current node under consideration; wherein the subtree of the at least one current node under consideration includes any of the nodes in the topic hierarchy that stem from the at least one current node under consideration; and wherein the local environment of the at least one current node under consideration includes any of the nodes in the topic hierarchy that stem from a parent node from which the at least one current node under consideration is stemmed directly. 23. The system of claim 22 further comprising: b) a user interface presentation unit coupled to the coherence analyzer unit for displaying the coherence measure for one or more current nodes under consideration. 24. The system of claim 22 further comprising: b) a feature extractor coupled to the coherence analyzer for receiving a set of labeled documents and at least one feature guideline and responsive thereto for generating the set of labeled feature vectors. 25. The system of claim 22 wherein the coherence analyzer unit further comprises: a--1) a training case counter for determining the number of training cases under a subtree of each of the nodes; a--2) an average prevalence determination unit for determining the average prevalence for at least one feature under each of the node subtrees; a--3) a predictive feature determination unit fur determining predictive features under each of the node subtrees; and a--4) a coherence assignment unit for generating coherence metric number based on at least one of the predictive features. 26. The system of claim 25 wherein the coherence analyzer unit further comprises: a--5) a subtopic uniformity determination unit for determining the degree of uniformity in the distribution of one or more of the predictive features among the children of the at least on current node; Wherein the coherence assignment unit generates a coherence metric number based on at least one of the predictive features that is deemed uniform based on the determined degree of uniformity of the at least one uniformity. 27. A computerized method for measuring a degree of coherence for one or more nodes in a hierarchy of nodes comprising the steps of: a) receiving the hierarchy and the training cases filed into the hierarchy; b) determining a list of predictive features that distinguish documents of a subtree of a first one of the nodes in the hierarchy from documents in the first node's local environment, wherein the first node's subtree includes any nodes in the hierarchy that stem from the first node, and the first node's local environment includes any parent node from which the first node is stemmed directly, any sibling nodes that are stemmed directly from the first node's parent node, and any nodes that stem from the sibling nodes of the first node; c) assigning a coherence value to the first node-based on the list of predictive features and based on one or more of their degree of predictiveness, their degree of prevalence, and their degree of uniformity, wherein the degree of uniformity reflects how evenly distributed said predictive features are among the subtrees of the children nodes in the hierarchy that are directly stemmed from the first node based on the training cases under each of the subtrees of the children nodes.
연구과제 타임라인
LOADING...
LOADING...
LOADING...
LOADING...
LOADING...
이 특허에 인용된 특허 (3)
Gelvin, David C.; Girod, Lewis D.; Kaiser, William J.; Merrill, William M.; Newberg, Fredric; Pottie, Gregory J.; Sipos, Anton I.; Vardhan, Sandeep, Method and apparatus for internetworked wireless integrated network sensor (WINS) nodes.
Lazarus Michael A. ; Caid William R. ; Pugh Richard S. ; Kindig Bradley D. ; Russell Gerald S. ; Brown Kenneth B. ; Dunning Ted E. ; Carleton Joel L., System and method for optimal adaptive matching of users to most relevant entity and information in real-time.
Corston,Simon H.; Chandrasekar,Raman; Chen,Harr, Machine-learned approach to determining document relevance for search over large electronic collections of documents.
Skipper, Julie A.; Ganapathy, Priya, Methods and logic for autonomous generation of ensemble classifiers, and systems incorporating ensemble classifiers.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.