[논문]데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석

이영섭; 오현정; 김미경

doi:10.5351/kjas.2005.18.2.343

데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석
An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining 원문보기

응용통계연구 = The Korean journal of applied statistics, v.18 no.2, 2005년, pp.343 - 354

이영섭 (동국대학교 통계학과) , 오현정 (DNI컨설팅) , 김미경 (동국대학교 통계학과)

초록
AI-Helper

데이터 마이닝에서 데이터를 효율적으로 분류하고자 할 때 많이 사용하고 있는 알고리즘을 실제 자료에 적용시켜 분류성능을 비교하였다. 분류자 생성기법으로는 의사결정나무기법 중의 하나인 CART, 배깅과 부스팅 알고리즘을 CART 모형에 결합한 분류자, 그리고 SVM 분류자를 비교하였다. CART는 결과 해석이 쉬운 장점을 가지고 있지만 데이터에 따라 생성된 분류자가 다양하여 불안정하다는 단점을 가지고 있다. 따라서 이러한 CART의 단점을 보완한 배깅 또는 부스팅 알고리즘과의 결합을 통해 분류자를 생성하고 그 성능에 대해 평가하였다. 또한 최근 들어 분류성능을 인정받고 있는 SVM의 분류성능과도 비교?평가하였다. 각 기법에 의한 분류 결과를 가지고 의사결정나무를 형성하여 자료가 가지는 데이터의 특성에 따른 분류 성능을 알아보았다. 그 결과 데이터의 결측치가 없고 관측값의 수가 적은 경우는 SVM의 분류성능이 뛰어남을 알 수 있었고, 관측값의 수가 많을 때에는 부스팅 알고리즘의 분류성능이 뛰어났으며, 데이터의 결측치가 존재하는 경우는 배깅의 분류성능이 뛰어남을 알 수 있었다.

Abstract ▼ AI-Helper

The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.

주제어

참고문헌 (20)

김현중 (2004). Support Vector Machine 의 이론과 용용, 한국통계학회, , 1-1
이영섭, 오현정 (2003). 데이터 마이닝에서 배깅과 부스팅 알고리즘 비교 분석, 한국통계학회, , 97-102
Blake, C.L. and Merz, C.J. (1998). UCI Repository of machine learning databases [http:// www.ics.uci.edu/ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
Breiman, L.(1996). Bagging predictor, Machine Learning, 26, 123-140
Breiman, L., Friedman, J. H. and Olshen, R A. and Stone C. J. (1984). Classification and Regression Trees, Chapman and Hall
Burger, C. J. C(1998). A tutorial on support vector machines for pattern recognition, Bell Laboratories, Lucent Technoloties
Cristianini, N. and Shawe-Taylor, J.(2000). Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press
Efron, B. and Tibshirani, R (1993). An Introduction to the Bootstrap, Chapman and Hall
Freund, Y. (1995). Boosting a weak learning algorithm by majority, Information and Computation, 121, 256-285

상세보기
Kass, G.V. (1980). An exploratory technique for investing large quantities of categorical data, Applied Statistics, 119-127
Kearns M. and Valiant. L,G (1994). Cryptographic limitations on learning boolean formulae and finite automata, Joural of the Association for Computing Machinery, 41, 67-95

상세보기
Optiz D. and Maclin R.A.(1999). Popular ensemble methods: An empirical study, Journal of Artificial Intelligence Research, 11, 169-198
Platt J. and Cristianini, N. and Shawe-Taylor, J.(2000). Large margin DAGs for multiclass classification, Advances in Neural Information Processing Systems, 12, 547-553
Quinlan, J.R (1993), C4.5, Programs for Machined Learning, Morgan Kaufmann, San Mateo
Saunders, C. (1998), Support vector machine user manual, RHUL Technical Report
Schapire, R (1990). The strength of weak learnability, Machine Learning, 5, 197-227

상세보기
Weston, J. and Watkins C. (1998). Multi-class support vector machines, Technical Report CSD-TR-98-04, Royal Holloway
Valiant, L.C. (1984). A theory of the learnable, Communication of the ACM, 27, 1134-1142

상세보기
Vapnik, V. (1979). Estimation of Dependences Based on Empirical Data, Nauka. (English translation Springer Verlag, 1982)
Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer Verlag

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석
An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (20)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석 An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (20)

이 논문을 인용한 문헌

저자의 다른 논문 :

이영섭 (37)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석
An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining 원문보기

초록
AI-Helper