IPC분류정보
국가/구분 |
United States(US) Patent
등록
|
국제특허분류(IPC7판) |
|
출원번호 |
UP-0354265
(2006-02-14)
|
등록번호 |
US-7584168
(2009-09-16)
|
우선권정보 |
FR-05 01487(2005-02-14) |
발명자
/ 주소 |
|
출원인 / 주소 |
|
대리인 / 주소 |
Westman, Champlin & Kelly, P.A.
|
인용정보 |
피인용 횟수 :
2 인용 특허 :
23 |
초록
▼
A method and apparatus are provided for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes. The method includes a step for obtaining said classification model, itself comprising a step for defining
A method and apparatus are provided for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes. The method includes a step for obtaining said classification model, itself comprising a step for defining a mode of use for each attribute, which comprises specifying which property or properties are possessed by said attribute among the following at least two properties, which are not exclusive of each other: an attribute is marked target if it has to be explained; an attribute is marked taboo if it has not to be used as an explanatory attribute, an attribute not marked taboo being an explanatory attribute. Furthermore, the classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo.
대표청구항
▼
What is claimed is: 1. A method comprising: generating with a computer a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database; obtaining said classif
What is claimed is: 1. A method comprising: generating with a computer a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database; obtaining said classification model, the classification model comprising defining a mode of use for each attribute, which comprises: marking said attribute "target" if the attribute has to be explained, or "not target" if the attribute has not to be explained; and marking said attribute "taboo" if the attribute has not to be used as explanatory, or "not taboo" if the attribute has to be used as explanatory; "target" and "taboo" being two properties not exclusive of each other, and wherein, said classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo. 2. A method according to claim 1 wherein, among the unsupervised classification models, the following can be distinguished: those in each of which all the attributes are marked target, and no attribute is marked taboo; and those in each of which all the attributes are marked target, and at least one attribute, but not all the attributes, is marked taboo. 3. A method according to claim 1, wherein defining the mode of use for each attribute also comprises: marking said attribute "masked" if the attribute has not to be taken into account to define said classification model, or "not masked" if the attribute has to be taken into account to define said classification model, "masked" being a property not exclusive of the "target" and "taboo" properties. 4. A method according to claim 1, said classification tree comprising at least one node enabling the definition of a partition on a data subset received by said node, each node being a set of partitioning rules comprising at least one rule defined by at least one condition on at least one of the attributes wherein, to generate the set of rules of each node, the method comprises the following steps, performed iteratively so long as at least one end-of-exploration criterion has not been verified: a) obtaining a new set of rules enabling a definition of a new partition on the data subset received by the node, condition or conditions that define each rule pertaining solely to one or more attributes not marked taboo; b) making a first evaluation of a quality of the new partition, by obtaining a new value of a first indicator computed with the set of data, only attributes marked target influencing said first indicator; and c) if the quality of the new partition, evaluated with the first criterion, is greater than that of the partitions evaluated during the preceding iterations, also with the first criterion, a new value of the first indicator is stored and becomes a current optimal value, and the new set of rules is stored and becomes a current optimal set of rules, so that, ultimately, the set of rules of said node is the current optimal set of rules at the time when an end-of-exploration criterion is verified. 5. A method according to claim 4, wherein the obtaining of a new value of the indicator, for a new partition, comprises the following steps: a weight equal to one is assigned to each of original attributes marked target and a weight equal to zero is assigned to each of original attributes not marked target; the set of original attributes is re-encoded as a set of attributes re-encoded Z1 . . . ZQ such that: if an original attribute is of a discrete type and possesses N different modalities, the original attribute is re-encoded according to a full disjunctive re-encoding into N re-encoded attributes having [0; 1] as a value domain, each of the N re-encoded attributes being assigned a weight identical to that of an original attribute from which the re-encoded attribute is derived; if an original attribute is of a continuous type, the original attribute is re-encoded as a re-encoded attribute, in standardizing the re-encoded attribute's value domain at [0; 1], said re-encoded attribute being assigned a weight identical to that of an original attribute from which the re-encoded attribute is derived; the new partition is re-encoded in the space of the re-encoded attributes; and an Intra-Class Inertia of the new partition, re-encoded in the space of the re-encoded attributes, is computed. 6. A method according to claim 5, wherein the Intra-Class Inertia is defined by: where P is a partition of n points, formed by K subsets C1 . . . CK, d(.) is a function of distance between two points, pi is the weight of the point number i, the sum of the weights pi being equal to one, and gk is the center of gravity of the set of points belonging to the subset Ck, and wherein the distance function d(.), for the data re-encoded in the space of the attributes re-encoded Z1 . . . ZQ, is defined by: where da(Z1a, Z2a, . . . ZQa) and db(Z1b, Z2b, . . . ZQb) are two pieces of data re-encoded in the space of the re-encoded data Z1 . . . ZQ, and: Distance Zj(xj,yj)=|xj-yj| if Zj is a re-encoded attribute coming from a continuous type of attribute, Distance Zj(xj,yj)=|xj-yj|/2 if Zj is a re-encoded attribute coming from a full disjunctive re-encoding of a discrete attribute. 7. A method according to claim 4, wherein the step of obtaining said new set of rules, enabling the definition of said new partition, is performed according to a mode of stochastic generation of the partitions. 8. A method according to claim 7, wherein the mode of stochastic generation of the partition is based on a combinatorial optimization meta-heuristic of the VNS or "variable neighborhood search" type. 9. A method according to claim 1, said classification tree comprising at least one node enabling the definition of a partition on a data subset received by said node, each node being a set of partitioning rules comprising at least one rule defined by at least one condition on at least one of the attributes wherein, to generate the set of rules of each node, the method comprises the following steps, performed iteratively so long as at least one end-of-exploration criterion has not been verified: a) obtaining a new set of rules enabling the definition of a new partition on the data subset received by the node, condition or conditions that define each rule pertaining solely to one or more attributes not marked taboo; b) making a first evaluation of a quality of the new partition, by obtaining a new value of a first indicator computed with the set of data, only attributes marked target influencing said first indicator; and c) if the quality of the new partition, evaluated with the first criterion, is greater than that of the partitions evaluated during the previous iterations, with the first criterion, then: a second evaluation is made of a quality of the new partition, by obtaining a new value of a second indicator, having a same nature as the first indicator but being computed with a subset of test data; if the quality of the new partition, evaluated with the second criterion, is greater than that of the partition evaluated during the previous iterations, with the second criterion, the new values of the first and second indicators are stored and become current optimal values and the new set of rules is stored and becomes a current optimal set of rules, and if the quality of the new partition, evaluated with the second criterion, is lower than or equal to that of the partition evaluated during the previous iterations, with the second criterion, a counter is implemented and, if the counter has reached a determined threshold, an end-of-exploration criterion is considered to have been verified; if not the new values of the first and second indicators are stored and become current optimal values and the new set of rules is stored and becomes a current optimal set of rules. 10. A computer-readable storage memory, which may be totally or partially detachable, comprising a set of instructions stored thereon and executable by said computer to implement a method for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database, wherein the method comprises obtaining said classification model, the classification model comprising defining a mode of use for each attribute, which comprises: marking said attribute "target" if the attribute has to be explained, or "not target" if the attribute has not to be explained; and marking said attribute "taboo" if the attribute has not to be used as explanatory, or "not taboo" if the attribute has to be used as explanatory; "target" and "taboo" being two properties not exclusive of each other, and wherein, said classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo. 11. A device for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database, wherein the device comprises a memory and means for obtaining said classification model, the means themselves comprising means to define a mode of use for each attribute which comprise: means for marking said attribute "target" if the attribute has to be explained, or "not target" if the attribute has not to be explained: and means for marking said attribute "taboo" if the attribute has not to be used as explanatory, or "not taboo" if the attribute has to be used as explanatory, "target" and "taboo" being two properties not exclusive of each other, and wherein, said classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.