[특허]Fuzzy text categorizer

Fuzzy text categorizer 원문보기

IPC분류정보
국가/구분	United States(US) Patent 등록
국제특허분류(IPC7판)	G06F-009/44 G06N-007/02 G06N-007/06
출원번호	US-0928619 (2001-08-13)
발명자 / 주소	Shanahan, James G.
출원인 / 주소	Xerox Corporation
인용정보	피인용 횟수 : 41 인용 특허 : 3

초록 ▼

A text categorizer classifies a text object into one or more classes. The text categorizer includes a pre-processing module, a knowledge base, and an approximate reasoning module. The pre-processing module performs feature extraction, feature reduction, and fuzzy set generation to represent an unlabelled text object in terms of one or more fuzzy sets. The approximate reasoning module uses a measured degree of match between the one or more fuzzy set and categories represented by fuzzy rules in the knowledge base to assign labels of those categories that satisfy a selected decision making rule.

대표청구항 ▼

1. A method for classifying a text object, comprising:extracting a set of features from the text object; the set of features having a plurality of features; constructing a document class fuzzy set with a plurality of ones of the set of features extracted from the text object; each of the ones of the features extracted from the text object having a degree of membership in the document class fuzzy set and a plurality of class fuzzy sets of a knowledge base; measuring a degree of match between each of the plurality of class fuzzy sets and the document class fuzzy set; and using the measured degree of match to assign the text object a label that satisfies a selected decision making rule; wherein the document class fuzzy set is computed by: calculating a frequency of occurrence for each feature in the set of features in the text object; normalizing the frequency of occurrence of each feature in the set of features; and transforming the normalized frequency of occurrence of each feature in the set of features to define the document class fuzzy set. 2. The method according to claim 1, further comprising learning each class fuzzy set in the knowledge base.3. The method according to claim 2, wherein each class fuzzy set is learned by:obtaining a set of class training documents; merging those training documents in the set of training documents with similar labels to create a class document; and computing a class fuzzy set using the class document. 4. The method according to claim 1, wherein the set of features is extracted from the text object by:tokenizing the document to generate a word list; parsing the word list to generate the set of grammar based features; and filtering the set of grammar based features to reduce the number of features in the set of grammar based features to define the ones of the set of features extracted from the text object used to construct the document class fuzzy set. 5. The method according to claim 1, wherein the document fuzzy set is computed by: filtering the set of features to reduce the number of features in the set of features to define the ones of the set of features extracted from the text object used to construct the document class fuzzy set.6. The method according to claim 1, wherein the normalized frequency of occurrence of each feature in the set of features is transformed using a bijective transformation.7. The method according to claim 1, wherein the degree of match between each of the plurality of class fuzzy sets and the document class fuzzy set is measured using one of a maximum-minimum strategy and a probabilistic reasoning strategy based upon semantic unification.8. The method according to claim 1, further comprising:filtering each degree of match with an associated class specific filter function to define an activation value for its associated class rule; identifying the activation value of the class rule with the highest activation value; each class rule having an associated class label; and assigning the class label of the class rule with the highest identified activation value to classify the text object into one of the plurality of class fuzzy sets. 9. The method according to claim 8, further comprising learning each associated class specific filter function.10. The method according to claim 1, wherein the decision making rule is used to identify one of a maximum value, a threshold value, and a predefined number.11. A method for classifying a text object, comprising:extracting a set of granule features from the text object; each granule feature being represented by a plurality of fuzzy sets and associated labels; constructing a document granule feature fuzzy set using a plurality of ones of the granule features extracted from the text object; each of the ones of the granule features extracted from the text object having a degree of membership in a corresponding granule feature fuzzy set of the document granule feature fuzzy set and a plurality of class granule feature fuzzy sets of a knowledge base; computing a degree of match between each of the plurality of class granule feature fuzzy sets and the document granule feature fuzzy set to provide a degree of match for each of the ones of the granule features; aggregating each degree of match of the ones of the granule features to define an overall degree of match for each feature; and using the overall degree of match for each feature to assign the text object a class label that satisfies a selected decision making rule. 12. The method according to claim 11, further comprising filtering the granule features extracted from the text object to define the ones of the granule features used to construct the document granule feature fuzzy set.13. The method according to claim 12, wherein the filtering of the granule features is based upon one of Zipf's law and semantic discrimination analysis.14. The method according to claim 11, wherein the ones of the granule features that are used to construct the document granule feature fuzzy set are reduced to one of a predefined threshold number of granule features and range of granule features.15. The method according to claim 11, further comprising learning each granule fuzzy set in the knowledge base.16. The method according to claim 11, wherein the degree of match between each of the plurality of class granule feature fuzzy sets and the document granule feature fuzzy set is measured using one of a maximum-minimum strategy and a probabilistic reasoning strategy based upon semantic unification.17. The method according to claim 11, further comprising:filtering each degree of match with an associated class specific filter function to define an activation value for its associated class rule; identifying the activation value of the class rule with the highest activation value; each class rule having an associated class label; and assigning the class label of the class rule with the highest identified activation value to classify the text object into one of the plurality of class granule feature fuzzy sets. 18. The method according to claim 11, further comprising learning each associated class specific filter function byinitializing a granule frequency distribution for each class label; and converting the granule frequency distribution for each class label into a granule fuzzy set. 19. The method according to claim 11, wherein individual degrees of matches are aggregated using one of a product and an additive model.20. The method according to claim 11, further comprising estimating granule feature weights when they are aggregated as a weighted function using an additive model.21. A text categorizer for classifying a text object, comprising:a knowledge base for storing categories represented by class fuzzy sets and associated class labels; a pre-processing module for representing a plurality of extracted features from the text object as a document class fuzzy set; and an approximate reasoning module for using a measured degree of match between the class fuzzy sets in the knowledge base and the document class fuzzy set to assign to the text object the associated class labels of those categories that satisfy a selected decision making rule; wherein the pre-processing module further comprises a fuzzy set generator for: calculating a frequency of occurrence for the plurality of features extracted from the text object; normalizing the frequency of occurrence of each feature of the plurality of features extracted from the text object; and transforming the normalized frequency of occurrence of each of the plurality of features extracted from the text object to define the document class fuzzy set. 22. The text categorizer of claim 21, further comprising a learning module for learning the class fuzzy sets.23. The text categorizer of claim 22, further comprising:a training database for creating a plurality of class documents; and a validation database for validating learned class fuzzy sets in the knowledge base. 24. The text categorizer of claim 22, wherein the learning module learns the class fuzzy sets are learned by:obtaining a set of class training documents; merging those training documents in the set of training documents with similar labels to create a class document; and computing a class fuzzy set using the class document. 25. A text categorizer for classifying a text object, comprising:a feature extractor for extracting a set of granule features from the text object; each granule feature being represented by a plurality of fuzzy sets and associated labels; a fuzzy set generator for constructing a document granule feature fuzzy set using a plurality of ones of the granule features extracted from the text object; each of the ones of the granule features extracted from the text object having a degree of membership in a corresponding granule feature fuzzy set of the document granule feature fuzzy set and a plurality of class granule feature fuzzy sets of a knowledge base; and an approximate reasoning module for: computing a degree of match between each of the plurality of class granule feature fuzzy sets and the document granule feature fuzzy set to provide a degree of match for each of the ones of the granule features; aggregating each degree of match of the ones of the granule features to define an overall degree of match for each feature; and using the overall degree of match for each feature to assign the text object a class label that satisfies a selected decision making rule. 26. The text categorizer according to claim 25, further comprising a learning module for learning each associated class specific filter function byinitializing a granule frequency distribution for each class label; and converting the granule frequency distribution for each class label into a granule fuzzy set.

이 특허에 인용된 특허 (3)

MacCuish, John D.; Nicolaou, Christodoulos A., Method and system for artificial intelligence directed lead discovery though multi-domain agglomerative clustering.
상세보기
Duvoisin ; III Herbert ; Beck Hal E. ; Brown Joe R. ; Bower Mark, Perceptive system including a neural network.
상세보기
A. Kathleen Hennessey ; YouLing Lin ; Rajasekar Reddy ; C. Rinn Cleavelin ; Howard V. Hastings, II ; Pinar Kinikoglu ; Wan S. Wong, System and method for classifying an anomaly.
상세보기

이 특허를 인용한 특허 (41)

Betz, Jonathan T.; Zhao, Shubin, Anchor text summarization for corroboration.
상세보기
Simard, Charles-Olivier; Bowyer, Alex; Leclerc, Daniel; Molloy, Steve, Auto-classification system and method with dynamic user feedback.
상세보기
Simard, Charles-Olivier; Bowyer, Alex; Leclerc, Daniel; Molloy, Steve, Auto-classification system and method with dynamic user feedback.
상세보기
Campanelli, Michael Robert; Eschbach, Reiner, Automated form fill-in via form retrieval.
상세보기
Hogue, Andrew W.; Betz, Jonathan T., Automatic object reference identification and linking in a browseable fact repository.
상세보기
Petriuc, Mihai, Click distance determination.
상세보기
Ashby,Gary H.; Schuldt,Marlo E., Collection management database of arbitrary schema.
상세보기
Gates, Stephen C., Creating taxonomies and training data for document categorization.
상세보기
Tankovich, Vladimir; Meyerzon, Dmitriy; Poznanski, Victor, Detection of junk in search result ranking.
상세보기
Vespe, David J.; Hogue, Andrew, Determining geographic locations for place names in a fact repository.
상세보기
Tankovich, Vladimir; Meyerzon, Dmitriy; Taylor, Michael James, Document length as a static relevance feature for ranking search results.
상세보기
Meyerzon, Dmitriy; Shnitko, Yauhen; Burges, Chris J. C.; Taylor, Michael James, Enterprise relevancy ranking using a neural network.
상세보기
Spehr, Darren; Aegard, John; Berk, Matthew, Facility for reconciliation of business records using genetic algorithms.
상세보기
Laroco, Jr., Leonardo A.; Jevtic, Nikola; Yakovenko, Nikolai V.; Reynar, Jeffrey, Finding and disambiguating references to entities on web pages.
상세보기
Laroco, Jr., Leonardo A.; Jevtic, Nikola; Yakovenko, Nikolai V.; Reynar, Jeffrey, Finding and disambiguating references to entities on web pages.
상세보기
Ferrari, Adam J.; Gourley, David J.; Johnson, Keith A.; Knabe, Frederick C.; Mohta, Vinay B.; Tunkelang, Daniel; Walter, John S., Hierarchical data-driven search and navigation system and method for information retrieval.
상세보기
Betz, Jonathan T., Identifying the unifying subject of a set of facts.
상세보기
Acharya,Tinku; Chaira,Tamalika; Ray,Ajay K., Image color matching scheme.
상세보기
Ferrari,Adam J.; Lau,Andrew M.; Mohta,Vinay B.; Tunkelang,Daniel; Walter,John S., Integrated application for manipulating content in a hierarchical data-driven search and navigation system.
상세보기
Zhao, Shubin, Learning objects and facts from documents.
상세보기
Sun, Xiang, Method and apparatus of text classification.
상세보기
Sankaran, Venkata Ragavan Kondalam; Anbalagan, Ashok Raj, Method and system for automated form document fill-in via image processing.
상세보기
Martin, Kingsley; Liggett, Tracy S., Method and system for creating a data profile engine, tool creation engines and product interfaces for identifying and analyzing files and sections of files.
상세보기
Zelevinsky, Vladimir V.; Tunkelang, Daniel; Knabe, Frederick C.; Saji, Michael Y.; Tzanov, Velin Krassimirov, Method and system for information retrieval with clustering.
상세보기
Sun, Xiang, Method and system for text classification.
상세보기
Poznanski, Victor; Wang, Oivind; Holm, Fredrik; Bodd, Nicolai; Tankovich, Vladimir; Meyerzon, Dmitriy, Re-ranking search results.
상세보기
Ludlow, Stephen; Pettigrew, Steve; Dowgailenko, Alex; Deligia, Agostino; Giguère, Isabelle, Reconfigurable model for auto-classification system and method.
상세보기
Ferrari, Adam; Gourley, David; Johnson, Keith; Knabe, Frederick; Lau, Andrew; Mohta, Vinay; Tunkelang, Daniel; Walter, John, Scalable hierarchical data-driven navigation system and method for information retrieval.
상세보기
Tankovich, Vladimir; Li, Hang; Meyerzon, Dmitriy; Xu, Jun, Search results ranking using editing distance and document information.
상세보기
Gluzman Peregrine, Vladimir; Rosen, Alexander D.; Scarlet, Benjamin S.; Volpe, Andrew, System and method for filtering rules for manipulating search results in a hierarchical search and navigation system.
상세보기
Ferrari, Adam J.; Knabe, Frederick C.; Mohta, Vinay Seth; Myatt, Jason Paul; Scarlet, Benjamin S.; Tunkelang, Daniel; Walter, John S.; Wang, Joyce; Tucker, Michael, System and method for information retrieval from object collections with complex interrelationships.
상세보기
Ferrari,Adam J.; Gourley,David J.; Johnson,Keith A.; Knabe,Frederick C.; Mohta,Vinay B.; Tunkelang,Daniel; Walter,John S.; Lau,Andrew, System and method for manipulating content in a hierarchical data-driven search and navigation system.
상세보기
Merrigan, Chadd Creighton; Peltonen, Kyle G.; Meyerzon, Dmitriy; Lee, David J., System and method for scoping searches using index keys.
상세보기
Grefenstette, Gregory T.; Shanahan, James G., System for automatically generating queries.
상세보기
Grefenstette, Gregory T; Shanahan, James G, System for automatically generating queries.
상세보기
Hogue, Andrew William; Siemborski, Robert Joseph; Betz, Jonathan T., System for ensuring the internal consistency of a fact repository.
상세보기
Hubert, Laurence, System with user directed enrichment.
상세보기
Hubert, Laurence; Guerin, Nicolas, System with user directed enrichment and import/export control.
상세보기
Hartman, Eric J.; Schweiger, Carl A.; Sayyarrodsari, Bijan; Johnson, W. Douglas, Training a support vector machine with process constraints.
상세보기
Betz, Jonathan T.; Zhao, Shubin, Unsupervised extraction of facts.
상세보기
Betz, Jonathan T.; Zhao, Shubin, Unsupervised extraction of facts.
상세보기

IPC	Description
A	생활필수품
A62	인명구조; 소방(사다리 E06C)
A62B	인명구조용의 기구, 장치 또는 방법(특히 의료용에 사용되는 밸브 A61M 39/00; 특히 물에서 쓰이는 인명구조 장치 또는 방법 B63C 9/00; 잠수장비 B63C 11/00; 특히 항공기에 쓰는 것, 예. 낙하산, 투출좌석 B64D; 특히 광산에서 쓰이는 구조장치 E21F 11/00)
A62B-1/08	.. 윈치 또는 풀리에 제동기구가 있는 것

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표IPC 관리번호, 국가코드, 자료구분, 상태, 출원번호, 출원일자, 공개번호, 공개일자, 공고번호, 공고일자, 등록번호, 등록일자, 발명명칭(한글), 발명명칭(영문), 출원인(한글), 출원인(영문), 출원인코드, 대표출원인, 출원인국적, 출원인주소, 발명자, 발명자E, 발명자코드, 발명자주소, 발명자 우편번호, 발명자국적, 대표IPC, IPC코드, 요약, 미국특허분류, 대리인주소, 대리인코드, 대리인(한글), 대리인(영문), 국제공개일자, 국제공개번호, 국제출원일자, 국제출원번호, 우선권, 우선권주장일, 우선권국가, 우선권출원번호, 원출원일자, 원출원번호, 지정국, Citing Patents, Cited Patents
저장형식	Text(ASCII format) Excel format PIAS분석(.xls)
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Fuzzy text categorizer 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (41)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Fuzzy text categorizer 원문보기

초록 ▼

대표청구항 ▼

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

이 특허에 인용된 특허 (3)

이 특허를 인용한 특허 (41)

관련 콘텐츠

특허 원문 보기

IPC 상위 출원인

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트