[논문]판단 트리 분류를 위한 SQL 기초 기능의 구현에 관한 연구

안형근; 고재진

doi:10.3745/ktsde.2013.2.12.855

초록
AI-Helper

판단 트리 분류는 데이터 마이닝의 중요한 문제의 하나이고, 데이터 마이닝은 대형 데이터베이스 기술의 중요한 과제가 되고 있다. 그러므로 데이터베이스와 데이터 마이닝 시스템의 결합 노력은 판단 트리 분류와 같은 데이터 마이닝 기능을 지원하는 데이터베이스 기초 기능의 개발로 이어지고 있다. 이런 기초 기능은 분류 알고리즘의 SQL 구현을 지원하는 특수한 데이터베이스 연산들로 구현되며, 특정 알고리즘을 구현하여 데이터베이스 시스템의 구성 모듈로 사용하고 있다. 데이터마이닝 기능을 제공하는 데이터베이스 기초 기능의 개발에는 두 가지 관점이 있다. 하나는 데이터 마이닝 기능을 분석해서 그런 기능들을 제공하는 데이터베이스 공통 기초 기능을 확인하는 것, 다른 하나는 데이터베이스 시스템의 인터페이스의 한 부분으로 이런 기초 기능의 구현을 위한 확장된 메커니즘을 제공하는 것이다. 데이터마이닝에서 어떤 기초 기능들을 DBMS에 저장할 것인가는 어려운 문제 중에 하나이다. 따라서 본 논문에서는 이러한 문제를 해결하기 위하여, 최적화된 판단 트리 분류기를 만들고 데이터베이스 기초 기능에 대해서 기술한다. 판단 트리 분류 알고리즘의 유용한 연산들을 확인하고, 상업적 DBMS에서 이러한 기초 기능의 구현에 대해서 기술하고, 성능 비교를 위한 실험 결과를 제시한다.

Abstract ▼ AI-Helper

Decision tree classification is one of the important problems in data mining fields and data minings have been important tasks in the fields of large database technologies. Therefore the coupling efforts of data mining systems and database systems have led the developments of database primitives sup...

Decision tree classification is one of the important problems in data mining fields and data minings have been important tasks in the fields of large database technologies. Therefore the coupling efforts of data mining systems and database systems have led the developments of database primitives supporting data mining functions such as decision tree classification. These primitives consist of the special database operations which support the SQL implementation of decision tree classification algorithms. These primitives have become the consisting modules of database systems for the implementations of the specific algorithms. There are two aspects in the developments of database primitives which support the data mining functions. The first is the identification of database common primitives which support data mining functions by analysis. The other is the provision of the extended mechanism for the implementations of these primitives as an interface of database systems. In data mining, some primitives want be stored in DBMS is one of the difficult problems. In this paper, to solve of the problem, we describe the database primitives which construct and apply the optimized decision tree classifiers. Then we identify the useful operations for various classification algorithms and discuss the implementations of these primitives on the commercial DBMS. We implement these primitives on the commercial DBMS and present experimental results demonstrating the performance comparisons.

주제어

질의응답

핵심어	질문	논문에서 추출한 답변
	데이터 마이닝의 SQL 활용 기법의 특징은?	데이터 마이닝하기 위한 데이터는 대부분 데이터베이스 시스템(이하 DBMS)에 저장되고, 이 DBMS는 데이터 접근(access), 필터링(filtering), 인덱싱(indexing)하는 구현 기능들을 갖고 있다. 데이터 마이닝의 SQL 활용 기법은 대용량 데이터 처리, 병렬화, 필터링, 집계 기능 등과 같은 DBMS 기술을 주로 활용하고 데이터 자체뿐만 아니라 질의어 처리 결과를 마이닝 하는 것이 특징 이다[1,11]. 그러나, 처리 성능이 낮아 조인, 그룹핑, 집계 같은 SQL 연산만으로 데이터 마이닝 기능을 수행하기에는 충분하지 않은 문제점이 있어 SQL 연산의 최적화를 위한 인덱싱 기법을 사용하기도 하고, 또한, 효율적인 구현을 위해서 데이터 마이닝 기능들이 DBMS에서 연산이나 접근 패턴및 접근 경로 등의 지식을 활용하기도 한다.
	기존 연구들에서 데이터 마이닝 기능과 관련하여 서술된 DBMS의 구현 기능은?	기존 연구들은 주로 판단 트리 분류를 이용하여 데이터 마이닝 기능들을 확인하고 DBMS의 구현 기능들을 이용하였으며, DBMS에서는 데이터 마이닝 기능들을 지원할 몇 가지 기술들을 서술하고 있다[2, 3]. 첫째는 연관 규칙에 대한 새로운 언어 구성을 SQL에 추가하는 것, 둘째는 데이터 마이닝을 위한 OLE DB 같은 특수한 API를 사용하거나 사용자 정의 타입과 메소드를 사용해서 데이터 마이닝 기능을 내부적으로 구현, 셋째는 DBMS가 데이터 마이닝에 유용한 특수한 연산자나 기초 기능을 제공하는 것 등이 있다. 이 모든 방법들이 데이터 마이닝 기능들에 유용하지만 본 논문에서는 상기 내용에서 기술한 문제점 해결을 위하여 특수한 연산자나 기초 기능의 세 번째 기술 관점에서 연구가 진행되었다[11].
	데이터 마이닝하기 위한 데이터는 대부분 어디에 저장되는가?	판단 트리 분류는 데이터 마이닝의 중요한 문제의 하나이고, 데이터 마이닝은 대형 데이터베이스 기술에서 중요한 위치를 차지하고 있다. 데이터 마이닝하기 위한 데이터는 대부분 데이터베이스 시스템(이하 DBMS)에 저장되고, 이 DBMS는 데이터 접근(access), 필터링(filtering), 인덱싱(indexing)하는 구현 기능들을 갖고 있다. 데이터 마이닝의 SQL 활용 기법은 대용량 데이터 처리, 병렬화, 필터링, 집계 기능 등과 같은 DBMS 기술을 주로 활용하고 데이터 자체뿐만 아니라 질의어 처리 결과를 마이닝 하는 것이 특징 이다[1,11].

참고문헌 (12)

Surajit Chaudhuri, "Data Mining and Database Systems: Where is the Intersection?," Data Engineering Bulletin, 21(1): 4-8, 1998.
R. Meo, G. Psaila, and S. Ceri. A New SQL-like Operators for Mining Association Rules. VLDB'96, pp. 122-133, Mumbai, India, Sept., 3-6, 1996. R.
A. Netz, S. Chaudhuri, J. Bernhardt, and U. M. Fayyad, "Integration of Data Mining with Database Technology," Proceedings of 26th International Conference on Very Large Data Bases, September 10-14, 2000.
Vipin Kumar, etc., Introduction to data mining, Addison-Wesley, May 12, 2005.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, "Classification and Regression Trees," Chapman and Hall, 1984.
M. Xu, J. Wang, and T. Chen, "Improved decision tree algorithm: ID3+," Intelligent Computing in Signal Processing and Pattern Recognition, Vol.345, pp.141-149, 2006.
M. Mehta, I. Rissanen, and R. Agrawal, "MDL-based Decision Tree Pruning," Proc. of Intl. Conf. on Knowledge Discovery in Databases and Data Mining, Montreal, Canada, 1995.
S. Chaudhuri, U. M. Fayyad, and J. Bernhardt, "Scalable Classification over SQL Databases," ICDE-99, pp.470-479, Sydney, Australia, 1999.
J. Gerhke, R. Ramakrishnan, and V. Ganti, "RainForest - A Framework for Fast Decision Tree Construction of Large Datasets," VLDB'98, pp.416-427, New York City, New York, USA, 1999.
S.B. Kotsiantis, D. Kanellopoulos and P.E. Pintelas, "Data Preprocessing for supervised learning," International Journal of Computer Science, Vol.1, No.2, 2006.
M. BenHajHmida and A. Congiusta, "Parallel, distributed, and grid-based data mining : algorithms, systems, and applications," Handbook of Research on Computational Grid, IGI Global, pp.90-119, May, 2009.
L. Zhou, Z. Zhang, and M. Xu, "Massive data mining based on item sequence set grid space," In Proceedings of the 2nd International Asia Conference on Informatics in Control, Automation and Robotics, pp.208-211, March, 2010.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

판단 트리 분류를 위한 SQL 기초 기능의 구현에 관한 연구
A Study on the Implementation of SQL Primitives for Decision Tree Classification 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

질의응답

참고문헌 (12)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

판단 트리 분류를 위한 SQL 기초 기능의 구현에 관한 연구 A Study on the Implementation of SQL Primitives for Decision Tree Classification 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

질의응답

참고문헌 (12)

이 논문을 인용한 문헌

저자의 다른 논문 :

안형근 (7) 고재진 (9)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

판단 트리 분류를 위한 SQL 기초 기능의 구현에 관한 연구
A Study on the Implementation of SQL Primitives for Decision Tree Classification 원문보기

초록
AI-Helper