[논문]대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색

이상훈; 김범수; 최미정; 문양세

doi:10.5626/jok.2015.42.2.242

대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색
Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases

정보과학회논문지 = Journal of KIISE, v.42 no.2, 2015년, pp.242 - 254

이상훈 (강원대학교 컴퓨터과학과) , 김범수 (강원대학교 컴퓨터과학과) , 최미정 (강원대학교 컴퓨터과학과) , 문양세 (강원대학교 컴퓨터과학과)

초록
AI-Helper

본 논문에서는 다차원 인덱스 기반 다단계 k-NN 검색의 성능 향상 문제를 다룬다. 기존 다단계 k-NN 검색에서는 고차원 객체의 저차원 변환으로 인한 정보 손실로 k-NN 질의 결과 매우 큰 허용치(검색 범위)가 결정되어 범위 질의 결과로 많은 후보가 검색된다. 또한, 많은 후보는 후처리 과정에서 매우 많은 I/O 및 CPU 오버헤드를 발생시킨다. 본 논문에서는 이와 같은 고찰에 기반하여 범위 질의의 허용치를 줄여 후보 개수를 줄이고 이를 통해 성능을 향상시키는 방법을 제안한다. 먼저, k-NN 질의 결과로 결정된 허용치를 고차원 및 저차원 객체간 거리 비율로 강제 축소하여 범위 질의에 사용하는 허용치 축소 (근사적) 해결책을 제안한다. 다음으로, k-NN 질의 계수 k 대신 c k 를 사용하여 얻은 보다 타이트(tight)한 허용치로 범위 질의를 수행하는 계수 제어 (정확한) 해결책을 제안한다. 실제 객체 데이터를 사용하여 실험한 결과, 제안한 두 가지 해결책은 기존 다단계 k-NN 검색에 비해 후보 개수와 검색 시간 모두를 크게 향상시킨 것으로 나타났다.

Abstract ▼ AI-Helper

In this paper, we address the problem of improving the performance of multi-step k-NN search using multi-dimensional indexes. Due to information loss by lower-dimensional transformations, existing multi-step k-NN search solutions produce a large tolerance (i.e., a large search range), and thus, incur a large number of candidates, which are retrieved by a range query. Those many candidates lead to overwhelming I/O and CPU overheads in the postprocessing step. To overcome this problem, we propose two efficient solutions that improve the search performance by reducing the tolerance of a range query, and accordingly, reducing the number of candidates. First, we propose a tolerance reduction-based (approximate) solution that forcibly decreases the tolerance, which is determined by a k-NN query on the index, by the average ratio of high- and low-dimensional distances. Second, we propose a coefficient control-based (exact) solution that uses c k instead of k in a k-NN query to obtain a tigher tolerance and performs a range query using this tigher tolerance. Experimental results show that the proposed solutions significantly reduce the number of candidates, and accordingly, improve the search performance in comparison with the existing multi-step k-NN solution.

주제어

참고문헌 (20)

F. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, "Fast Nearest Neighbor Search in Medical Image Databases," Proc. of the 22nd Int'l Conference on Very Large Data Bases, Bombay, India, pp. 215-226, Sept. 1996.
R. Agrawal, C. Faloutsos, and A. Swami, "Efficient Similarity Search in Sequence Databases," Proc. of the 4th Int'l Conf. on Foundations of Data Organization and Algorithms, Chicago, Illinois, pp. 69-84, Oct. 1993.
Course of dimensionality. Encyclopedia of Machine Learning, pp. 257-258, Springer, 2010.
Y.-S. Moon, K.-Y. Whang, and W.-S. Han, "General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Madison, Wisconsin, pp. 382-393, Jun. 2002.
Y.-S. Moon, K.-Y. Whang, and W.-K. Loh, "Duality-Based Subsequence Matching in Time-Series Databases," Proc. of the 17th Int'l Conf. on Data Engineering, IEEE ICDE, Heidelberg, Germany, pp. 263-272, Apr. 2001.
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos, "Fast Subsequence Matching in Time-Series Databases," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Minneapolis, Minnesota, pp. 419-429, May. 1994.
Y. Tao, K. Yi, C. Sheng, and P. Kalnis, "Quality and Efficiency in High Dimensional Nearest Neighbor Search," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Providence, Rhode Island, USA, pp. 563-575, Jun./Jul. 2009.
T. Seidl and H. P. Kriegel, "Optimal Multi-Step k-Nearest Neighbor Search," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Seattle, Washington, pp. 154-165, Jun. 1998.
T. Seidl, Adaptable Similarity Search in 3-D Spatial Database Systems, Herbert Utz Verlag, 1998.
S. C. Chapra, Numerical Methods for Engineers, 6th Ed., McGraw-Hill Science, 2010.
B. G. Samuel and J. S. Neil, Using SPSS for Windows and Macintosh: Analyzing and Understanding Data, 6th Ed., Pearson College Div, 2010.
B.-S. Kim, Y.-S. Moon, and J. Kim, "Noise Control Boundary Image Matching Using Time-Series Moving Average Transform," Journal of KIISE: Database, Vol. 36, No. 4, pp. 327-340, Aug. 2009. (in Korean)
Y.-S. Moon, B.-S. Kim, M. S. Kim, and K.-Y. Whang, "Scaling-Invariant Boundary Image Matching Using Time-Series Matching Techniques," Data & Knowledge Engineering, Vol. 69, No. 10, pp. 1022-1042, Oct. 2010.

상세보기
National climatic data center, [Online]. Available: http://www.ncdc.noaa.gov. (downloaded 2013 Mar. 9)
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, Atlantic City, New Jersey, pp. 322-331, May 1990.
W.-S. Han, J. Lee, Y.-S. Moon, and H. Jiang, "Ranked Subsequence Matching in Time-Series Databases," Proc. of the 33rd Int'l Conf. on Very Large Data Bases, Vienna, Austria, pp. 423-434, Sept. 2007.
G. Roh, J. Roh, S. Hwang, and B. Yi, "Supporting Pattern Matching Queries over Trajectories on Road Networks," IEEE Trans. on Knowledge and Data Engineering, Vol. 23, No. 11, pp. 1753-1758, Nov. 2011.

상세보기
Y. Zhu and D. Shasha, "Warping Indexes with Envelope Transforms for Query by Humming," Proc. of Int'l Conf. on Management of Data, ACM SIGMOD, San Diego, California, pp. 181-192, Jun. 2003.
K.-P. Chan, A. W.-C. Fu, and C. T. Yu, "Harr Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping," IEEE Trans. on Knowledge and Data Engineering, Vol. 15, No. 3, pp. 686-705, Jan./Feb. 2003.

상세보기
Y.-S. Moon and J. Kim, "Efficient Moving Average Transform-Based Subsequence Matching Algorithms in Time-Series Databases," Information Sciences, Vol. 177, No. 23, pp. 5415-5431, Dec. 2007.

상세보기

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색
Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (20)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색 Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (20)

이 논문을 인용한 문헌

저자의 다른 논문 :

이상훈 (4) 김범수 (5) 최미정 (6) 문양세 (39)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

대용량 데이터베이스에서 다차원 인덱스를 사용한 효율적인 다단계 k-NN 검색
Efficient Multi-Step k-NN Search Methods Using Multidimensional Indexes in Large Databases

초록
AI-Helper