[특허]Method and device for the generation of a classification tree to unify the supervised and unsupervised approaches, corresponding computer package and storage means

Method and device for the generation of a classification tree to unify the supervised and unsupervised approaches, corresponding computer package and storage means 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-007/00
출원번호	UP-0354265 (2006-02-14)
등록번호	US-7584168 (2009-09-16)
우선권정보	FR-05 01487(2005-02-14)
발명자 / 주소	Meyer, Franck
출원인 / 주소	France Telecom
대리인 / 주소	Westman, Champlin & Kelly, P.A.
인용정보	피인용 횟수 : 2 인용 특허 : 23

초록 ▼

A method and apparatus are provided for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes. The method includes a step for obtaining said classification model, itself comprising a step for defining a mode of use for each attribute, which comprises specifying which property or properties are possessed by said attribute among the following at least two properties, which are not exclusive of each other: an attribute is marked target if it has to be explained; an attribute is marked taboo if it has not to be used as an explanatory attribute, an attribute not marked taboo being an explanatory attribute. Furthermore, the classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo.

대표청구항 ▼

What is claimed is: 1. A method comprising: generating with a computer a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database; obtaining said classification model, the classification model comprising defining a mode of use for each attribute, which comprises: marking said attribute "target" if the attribute has to be explained, or "not target" if the attribute has not to be explained; and marking said attribute "taboo" if the attribute has not to be used as explanatory, or "not taboo" if the attribute has to be used as explanatory; "target" and "taboo" being two properties not exclusive of each other, and wherein, said classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo. 2. A method according to claim 1 wherein, among the unsupervised classification models, the following can be distinguished: those in each of which all the attributes are marked target, and no attribute is marked taboo; and those in each of which all the attributes are marked target, and at least one attribute, but not all the attributes, is marked taboo. 3. A method according to claim 1, wherein defining the mode of use for each attribute also comprises: marking said attribute "masked" if the attribute has not to be taken into account to define said classification model, or "not masked" if the attribute has to be taken into account to define said classification model, "masked" being a property not exclusive of the "target" and "taboo" properties. 4. A method according to claim 1, said classification tree comprising at least one node enabling the definition of a partition on a data subset received by said node, each node being a set of partitioning rules comprising at least one rule defined by at least one condition on at least one of the attributes wherein, to generate the set of rules of each node, the method comprises the following steps, performed iteratively so long as at least one end-of-exploration criterion has not been verified: a) obtaining a new set of rules enabling a definition of a new partition on the data subset received by the node, condition or conditions that define each rule pertaining solely to one or more attributes not marked taboo; b) making a first evaluation of a quality of the new partition, by obtaining a new value of a first indicator computed with the set of data, only attributes marked target influencing said first indicator; and c) if the quality of the new partition, evaluated with the first criterion, is greater than that of the partitions evaluated during the preceding iterations, also with the first criterion, a new value of the first indicator is stored and becomes a current optimal value, and the new set of rules is stored and becomes a current optimal set of rules, so that, ultimately, the set of rules of said node is the current optimal set of rules at the time when an end-of-exploration criterion is verified. 5. A method according to claim 4, wherein the obtaining of a new value of the indicator, for a new partition, comprises the following steps: a weight equal to one is assigned to each of original attributes marked target and a weight equal to zero is assigned to each of original attributes not marked target; the set of original attributes is re-encoded as a set of attributes re-encoded Z1 . . . ZQ such that: if an original attribute is of a discrete type and possesses N different modalities, the original attribute is re-encoded according to a full disjunctive re-encoding into N re-encoded attributes having [0; 1] as a value domain, each of the N re-encoded attributes being assigned a weight identical to that of an original attribute from which the re-encoded attribute is derived; if an original attribute is of a continuous type, the original attribute is re-encoded as a re-encoded attribute, in standardizing the re-encoded attribute's value domain at [0; 1], said re-encoded attribute being assigned a weight identical to that of an original attribute from which the re-encoded attribute is derived; the new partition is re-encoded in the space of the re-encoded attributes; and an Intra-Class Inertia of the new partition, re-encoded in the space of the re-encoded attributes, is computed. 6. A method according to claim 5, wherein the Intra-Class Inertia is defined by: where P is a partition of n points, formed by K subsets C1 . . . CK, d(.) is a function of distance between two points, pi is the weight of the point number i, the sum of the weights pi being equal to one, and gk is the center of gravity of the set of points belonging to the subset Ck, and wherein the distance function d(.), for the data re-encoded in the space of the attributes re-encoded Z1 . . . ZQ, is defined by: where da(Z1a, Z2a, . . . ZQa) and db(Z1b, Z2b, . . . ZQb) are two pieces of data re-encoded in the space of the re-encoded data Z1 . . . ZQ, and: Distance Zj(xj,yj)=|xj-yj| if Zj is a re-encoded attribute coming from a continuous type of attribute, Distance Zj(xj,yj)=|xj-yj|/2 if Zj is a re-encoded attribute coming from a full disjunctive re-encoding of a discrete attribute. 7. A method according to claim 4, wherein the step of obtaining said new set of rules, enabling the definition of said new partition, is performed according to a mode of stochastic generation of the partitions. 8. A method according to claim 7, wherein the mode of stochastic generation of the partition is based on a combinatorial optimization meta-heuristic of the VNS or "variable neighborhood search" type. 9. A method according to claim 1, said classification tree comprising at least one node enabling the definition of a partition on a data subset received by said node, each node being a set of partitioning rules comprising at least one rule defined by at least one condition on at least one of the attributes wherein, to generate the set of rules of each node, the method comprises the following steps, performed iteratively so long as at least one end-of-exploration criterion has not been verified: a) obtaining a new set of rules enabling the definition of a new partition on the data subset received by the node, condition or conditions that define each rule pertaining solely to one or more attributes not marked taboo; b) making a first evaluation of a quality of the new partition, by obtaining a new value of a first indicator computed with the set of data, only attributes marked target influencing said first indicator; and c) if the quality of the new partition, evaluated with the first criterion, is greater than that of the partitions evaluated during the previous iterations, with the first criterion, then: a second evaluation is made of a quality of the new partition, by obtaining a new value of a second indicator, having a same nature as the first indicator but being computed with a subset of test data; if the quality of the new partition, evaluated with the second criterion, is greater than that of the partition evaluated during the previous iterations, with the second criterion, the new values of the first and second indicators are stored and become current optimal values and the new set of rules is stored and becomes a current optimal set of rules, and if the quality of the new partition, evaluated with the second criterion, is lower than or equal to that of the partition evaluated during the previous iterations, with the second criterion, a counter is implemented and, if the counter has reached a determined threshold, an end-of-exploration criterion is considered to have been verified; if not the new values of the first and second indicators are stored and become current optimal values and the new set of rules is stored and becomes a current optimal set of rules. 10. A computer-readable storage memory, which may be totally or partially detachable, comprising a set of instructions stored thereon and executable by said computer to implement a method for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database, wherein the method comprises obtaining said classification model, the classification model comprising defining a mode of use for each attribute, which comprises: marking said attribute "target" if the attribute has to be explained, or "not target" if the attribute has not to be explained; and marking said attribute "taboo" if the attribute has not to be used as explanatory, or "not taboo" if the attribute has to be used as explanatory; "target" and "taboo" being two properties not exclusive of each other, and wherein, said classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo. 11. A device for the generation of a classification tree as a function of a classification model and from a set of data to be classified, described by a set of attributes, said set of data being a set of documents of a documentary database, wherein the device comprises a memory and means for obtaining said classification model, the means themselves comprising means to define a mode of use for each attribute which comprise: means for marking said attribute "target" if the attribute has to be explained, or "not target" if the attribute has not to be explained: and means for marking said attribute "taboo" if the attribute has not to be used as explanatory, or "not taboo" if the attribute has to be used as explanatory, "target" and "taboo" being two properties not exclusive of each other, and wherein, said classification model belongs to the group comprising: supervised classification models with one target attribute, in each of which a single attribute is marked target and taboo, the attributes that are not marked target being not marked taboo; supervised classification models with several target attributes, in each of which at least two attributes, but not all the attributes, are marked target and taboo, the attributes not marked target being not marked taboo; and unsupervised classification models, in each of which all the attributes are marked target and at least one attribute is not marked taboo.

이 특허에 인용된 특허 (23)

Bo Thiesson ; Christopher A. Meek ; David Maxwell Chickering ; David Earl Heckerman, Collaborative filtering with mixtures of bayesian networks.
상세보기
Zhong,Hongsheng; Zaret,David, Core area territory planning for optimizing driver familiarity and route flexibility.
상세보기
Chickering, D. Maxwell, Dynamic determination of continuous split intervals for decision-tree learning without sorting.
상세보기
Hekmatpour Amir, Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications.
상세보기
Hekmatpour Amir, Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications.
상세보기
Hekmatpour Amir, Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications.
상세보기
Bennett,John; Heckerman,David E.; Meek,Christopher A.; Thiesson,Bo, Handwriting recognition with mixtures of bayesian networks.
상세보기
Sachs, Jeffrey R.; Wiener, Matthew C.; Yates, Nathan A., Mass spectrometry data analysis techniques.
상세보기
Mohda Dharmendra Shantilal ; Martin David Charles ; Spangler William Scott ; Vaithyanathan Shivakumar, Method and apparatus for cluster exploration and visualization.
상세보기
Sonneland, Lars; Gehrmann, Thomas, Method and apparatus for generating a cross plot in attribute space from a plurality of attribute data sets and generating a class data set from the cross plot.
상세보기
Mahe,Ga��l; Gilloire,Andr��, Method and system of correcting spectral deformations in the voice, introduced by a communication network.
상세보기
Pednault,Edwin Peter Dawson; Natarajan,Ramesh, Method for constructing segmentation-based predictive models.
상세보기
Ornstein Leonard (White Plains NY), Method for unsupervised neural network classification with back propagation.
상세보기
Aliferis,Constantin F.; Tsamardinos,Ioannis, Method, system, and apparatus for casual discovery and variable selection for classification.
상세보기
Kohavi Ron ; Tesler Joel D., Method, system, and computer program product for visualizing a decision-tree classifier.
상세보기
Agrawal Rakesh ; Chakrabarti Soumen ; Dom Byron Edward ; Raghavan Prabhakar, Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values.
상세보기
Rosen, Joseph S., Multistage machine learning process.
상세보기
Agrawal Rakesh ; Ho Ching-Tien ; Zaki Mohammed J., Parallel classification for data mining in a shared-memory multiprocessor system.
상세보기
Buechler,Kenneth F.; Fung,Eric Thomas; Yip,Tai Tung, Polypeptides related to natriuretic peptides and methods of their identification and use.
상세보기
Babu,Shivnath; Garofalakis,Minos N.; Rastogi,Rajeev, System and method for compressing a data table using models.
상세보기
Robert Evans, System and method for partitioning a real-valued attribute exhibiting windowed data characteristics.
상세보기
Ornstein Leonard (White Plains NY), Unsupervised neural network classification with back propagation.
상세보기
Evans, Robert; Wong, Did Bun, Using ink temperature gain to identify causes of web breaks in a printing system.
상세보기

이 특허를 인용한 특허 (2)

Swamy, Gitanjali, Insight and algorithmic clustering for automated synthesis.
상세보기
Swamy, Gitanjali, Insight and algorithmic clustering for automated synthesis.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Method and device for the generation of a classification tree to unify the supervised and unsupervised approaches, corresponding computer package and storage means 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (23)

이 특허를 인용한 특허 (2)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Method and device for the generation of a classification tree to unify the supervised and unsupervised approaches, corresponding computer package and storage means 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (23)

이 특허를 인용한 특허 (2)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트