IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
US-0808064
(2004-03-24)
|
등록번호 |
US-7333998
(2008-02-19)
|
발명자
/ 주소 |
- Heckerman,David E.
- Bradley,Paul S.
- Chickering,David M.
- Meek,Christopher A.
|
출원인 / 주소 |
|
대리인 / 주소 |
Amin, Turocy & Calvin, LLP
|
인용정보 |
피인용 횟수 :
160 인용 특허 :
27 |
초록
▼
A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity mea
A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed. The system permits a user to browse through the hierarchy, and, to readily comprehend segment inter-relationships, selectively expand and contract the displayed hierarchy, as desired, as well as to compare two selected segments or segment groups together and graphically display the results of that comparison. An alternative discriminant-based cluster scoring technique is also presented.
대표청구항
▼
We claim: 1. A computer-implemented system for automatically categorizing unknown incoming data and a category visualization (CV) system that displays a graphic representation of each category as a hierarchical map, comprising: a node corresponding to each base category; nodes corresponding to comb
We claim: 1. A computer-implemented system for automatically categorizing unknown incoming data and a category visualization (CV) system that displays a graphic representation of each category as a hierarchical map, comprising: a node corresponding to each base category; nodes corresponding to combinations of similar categories; a leaf node corresponding to a base category, the leaf node is positioned as a cluster of nodes at a lowest level of the hierarchy wherein combinations of similar categories are positioned on top of the leaf node, forming successively higher levels of the hierarchy; a root node corresponding to a category that contains all records in a collection, the root node forms top of the hierarchy; a non-leaf node corresponding to each combined category, wherein similar base categories are combined into a combined category; wherein each non-leaf node has two arcs that connect the non-leaf node to two nodes corresponding to sub-categories of the combined category; and wherein if a node is selected, the system displays additional information about corresponding category, the additional information is at least one of number of records in the category or characteristic attributes of the category, and wherein if an arc is selected, the system displays information relating to categories connected by the arc, such as similarity value for the connected categories. 2. The system of claim 1, wherein the base category is a category identified by a categorization process (classification and clustering). 3. The system of claim 1, wherein the combined category is assigned the records of two or more base categories. 4. The system of claim 1, wherein the additional information further comprises characteristic and discriminating information such as attribute-value discrimination, attribute-value discrimination refers to how well the value of an attribute distinguishes the records of one category from the records of another category. 5. The system of claim 4, wherein the attribute-value discrimination is determined by employing the following equation: where discrim(xi|G1,G2) is the measurement of how well the value of an attribute distinguishes the records of one combined category from the records of another combined category, G1 is the first combined category, G2 is the second combined category, xi is the records in one of the combined categories, p (xi|G1) is the probability that a record containing specific attributes is in combined category G1, and p (xi|G2) is the probability that a record containing specific attributes is in combined category Gx1|G1. 6. The system of claim 1, wherein similarity value refers to a rating of the differences between attribute values of records in one category and attribute values of records in another category, a high value for similarity indicates that there is little difference between the records in the two categories. 7. The system of claim 6, wherein the similarity value for a pair of base categories is determined by employing the following equation: where dist(h1, h2) is the distance and similarity between two categories, X1, . . . , Xm is the attribute values, h1, h2 is a count of a total number of records in categories 1 and 2, p(x1Xm|h1) is a conditional probability that a record has attribute values x1, . . . Xm given that it is a record from category 1, and p(x1Xm|h2) is a conditional probability that a record has attribute values x1, . . . Xm given that it is a record from category 2. 8. The system of claim 6, wherein the similarity for a pair of base categories is determined by employing the following equation: where dist(h1,h2) is the distance and similarity between two categories, xi is the attribute values, h1,h2 is a count of a total number of records in categories 1 and 2, xi|h1) is a conditional probability that a record has attribute values xi given that it is a record from category 1, and xi|h2) is a conditional probability that a record has attribute values x1 given that it is a record from category 2. 9. The system of claim 6, wherein the similarity for two combined categories is determined by employing the following equation: where dist(G1,G2) is the distance and similarity between two combined categories, G1 is the first combined category, G2 is the second combined category, hj,hk is a count of a total number of records in combined categories 1 and 2, and p(hj)p(hk) is a probability that a record is in each of the combined categories. 10. The system of claim 6, wherein the similarity for two combined categories is determined by employing the following equation: description="In-line Formulae" end="lead"dist(G1,G2)=min{dist(h j)(hk)|hj∈G1 ,hk∈G2}description="In-line Formulae" end="tail" where dist(G1,G2) is the minimum distance between two combined categories, G1 is the first combined category, G2 is the second combined category, and h1,hk is a count of a total number of records in combined categories 1 and 2. 11. The system of claim 6, wherein the similarity for two combined categories is determined by employing the following equation: description="In-line Formulae" end="lead"dist(G1,G2)=max{dist(h j)(hk)|hj∈G1 ,hk∈G2}description="In-line Formulae" end="tail" where dist(G1,G2) is the maximum distance between two combined categories, G1 is the first combined category, G2 is the second combined category, and h1, hk is a count of a total number of records in combined categories 1 and 2. 12. The system of claim 1, wherein the graphic representation of each category is displayed as a decision tree, further comprising: nodes that correspond to each attribute of the corresponding base categories; and arcs that correspond to values of that attribute; wherein each node, except the root node, represents a setting of attribute values as indicated by arcs in a path from a first node to the root node. 13. The system of claim 12, wherein the selection of a node, results in display of a probability for each category that a record in the category will have attribute settings that are represented by the path. 14. A computer-readable storage medium containing a plurality of categorized data records and a computer-implemented method of calculating and displaying a graphic representation of various characteristics and discriminating information for each category, comprising: providing nodes that represent each base category; providing nodes that represent combined categories, wherein combinations of similar categories are grouped together to form the combined categories; utilizing a leaf node to form the bottom of the graphic representation; utilizing a root node to form the top of the graphic representation connecting nodes representing sub-categories of a combined category via arcs; combining the two base categories that are the most similar into a combined category; repeating process of combining similar categories until one combined category represents all records in a collection; and allowing a node to be selected, wherein the system displays additional information about corresponding category, the additional information is at least one of number of records in the category or characteristic attributes of the category, and allowing an arc to be selected, wherein the system displays information relating to categories connected by the arc, such as similarity value for the connected categories. 15. The system of claim 14, further comprising de-emphasizing specific nodes and focusing on specific non-de-emphasized nodes.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.