[논문]k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류

방선이; 양재동; 양형정

k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류
Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus 원문보기

정보과학회논문지. Journal of KIISE. 소프트웨어 및 응용, v.31 no.9, 2004년, pp.1204 - 1217

방선이 (전북대학교 컴퓨터통계정보학과) , 양재동 (전북대학교 전자정보공학부) , 양형정 (카네기멜론대학 컴퓨터과학과)

초록
AI-Helper

기존의 통계적인 기법과 기계학습 기법 등을 이용한 자동 문서 분류는 주로 문서 벡터만으로 분류기를 학습하여 분류를 행하기 때문에 특정 범주로 문서를 분류하는데 명확치 않은 경우가 빈번히 발생하여 일정 수준 이상의 정확도를 얻는 데에는 한계를 보이고 있다. 이러한 문제를 해결하기 위해 본 논문에서는 기존 문서 분류 알고리즘에 범주 간의 관련성을 반영하여 분류를 시행하는 방법을 제안한다. 이 방법은 간단한 알고리즘에 비해 좋은 성능을 보이고 있는 k-NN 분류 알고리즘을 이용하여 일차적인 문서 분류를 수행한 후 특정 범주로 분류하기가 명확치 않을 경우, 객체 기반 시소러스에서 제공되는 범주들 간의 일반화 관계, 집성화 관계, 연관화 관계 그리고 인스턴스 관계를 이용하여 문서가 할당될 범주를 결정함으로써 자동 문서 분류의 정확도를 향상시킬 수 있다. 본 논문에서 제안된 방법으로 실험한 결과 k-NN 분류 알고리즘의 분류 결과에 비해 재현율은 유지되면서 최고 13.86% 까지 정확률이 향상되었다.

Abstract ▼ AI-Helper

Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

주제어

참고문헌 (19)

Mehnet, R., 'Federal Agency and Federal Library Reports : National Library of Medicine,' Bowker Ann : Library and Book Trade Almance, second ed., pp. 110-115, 1997
Yiming Yang. 'An Evaluation of Statistical Approaches to Text Categorization,' Journal of Information Retrieval, Vol.1, No.1, pp.67-88, 1999
Lam, W., Low, K. F. and Ho, C. Y., 'Using a Bayesian network induction approach for text categorization,' In Proceeding of the fifteenth International Joint Conference on Artificial Intelligence(IJCAI), Vol. 1, pp. 745-750, 1997
Diao, L., Hu, K., Lu, Y. and Shi, C., 'Boosting simple decision trees with Bayesian learning for text categorization,' In Proceeding of the fourth World Congress on Intelligent Control and Automation, Vol. 1, pp. 321-325, 2002
Soucy, P. and Mineau, G. W., 'A Simple KNN Algorithm for Text Categorization,' In Proceeding of the first IEEE International Conference on Data Mining(ICDM), Vol. 28, pp. 647-648, 2001
Sasaki, M. and Kita, K., 'Rule-Based Text Categorization Using Hierarchical Categories,' In Proceeding of the IEEE International Conference on Systems, Man and Cybernetics, Vol. 3, pp. 2827-2830, 1998
Jalam, R. and Teytaud, O., 'Kernel-based text categorization,' In Proceeding of the International Joint Conference on Neural Networks(IJCNN), Vol. 3, pp. 15-19, 2001
Schapire, R E. and Singer, Y., 'Text categorization with the concept of fuzzy set of informative keywords,' In Proceeding of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Vol. 2, pp. 609-614, 1999
Duda, R. O. and Hart, P. E., 'An algorithm for text categorization with SVM,' TENCON '02. In Proceeding of the IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, Vol.1, pp. 47-50, 2002
Sebastiani F., 'Machine learning in automated text categorization,' ACM Computing Surveys, Vol.34, No.1, pp.1-47, 2002

상세보기
Antonie, M. L. and Zaiane, O. R, 'Text document categorization by term association,' In Proceeding of the second IEEE International Conference on Data Mining (ICDM) , pp. 19-26, Dec. 2002
Hiroshi, U., Takao, M. and SHIOYA, I., 'Improving Text Categorization By Resolving Semantic Ambiguity,' In Proceeding of the IEEE Pacific Rim Conference on Communications, Computers and Signal processing(PACRIM), pp. 796-799, 2003
Bao, Y. and Ishii, N., 'Combining Multiple K-Nearest Neighbor Classifiers for Text Classification by Reducts,' In Proceeding of the fifth International Conference on Discovery Science, pp. 340-347, 2002
Han, E. H., Karypis, G. and Kumar, V., 'Text categorization using weight adjusted k-nearest neighbor classification,' In Proceeding of the fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD), pp. 53-65, 1999
Lim, H. S., 'A Comparative Evaluation of Korean Text Categorization based on kNN Learning,' In Proceeding of the International Conference on Artificial Intelligence(IC-AI), pp. 755-759, 2002
고영중, 서정연, '문서관리를 위한 자동문서범주화에 대한 이론 및 기법', 정보관리연구, 제33권, 제2호, pp.19-32, 2002
Aas, K. and Eikvil, L., 'Text Categorization : A Survey,' Report No. NR 941, Norwegian Computing Center. URL http://citeseer.ist.psu.edu/aas99text.htm
이경찬, 강승식, '자질 중요도 계산 기법에 의한 자동 문서 범주화', 한국정보과학회 봄 학술발표 논문집(B), 제30권, 제2호, pp. 537-539, 2003
Choi, J. H., Yang, J. D. and Lee, D. G., 'An Object-Based Approach to Managing Domain Specific Thesauri: Semiautomatic Thesaurus Construction and Query-Based Browsing,' International Journal of Software Engineering & Knowledge Engineering, Vol. 10, No.4, pp. 1-27, 2002

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류
Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류 Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

양재동 (18)

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류
Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus 원문보기

초록
AI-Helper