[논문]추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법

이오준; 유은순

doi:10.13088/jiis.2015.21.1.119

추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법
Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System 원문보기

지능정보연구 = Journal of intelligence and information systems, v.21 no.1, 2015년, pp.119 - 142

이오준 (중앙대학교 컴퓨터공학과) , 유은순 (단국대학교 미디어콘텐츠연구원)

초록
AI-Helper

사용자의 취향과 선호도를 고려하여 정보를 제공하는 추천 시스템의 중요성이 높아졌다. 이를 위해 다양한 기법들이 제안되었는데, 비교적 도메인의 제약이 적은 협업 필터링이 널리 사용되고 있다. 협업 필터링의 한 종류인 모델 기반 협업 필터링은 기계학습이나 데이터 마이닝 모델을 협업 필터링에 접목한 방법이다. 이는 희박성 문제와 확장성 문제 등의 협업 필터링의 근본적인 한계를 개선하지만, 모델 생성 비용이 높고 성능/확장성 트레이드오프가 발생한다는 한계점을 갖는다. 성능/확장성 트레이드오프는 희박성 문제의 일종인 적용범위 감소 문제를 발생시킨다. 또한, 높은 모델 생성 비용은 도메인 환경 변화의 누적으로 인한 성능 불안정의 원인이 된다. 본 연구에서는 이 문제를 해결하기 위해, 군집화 기반 협업 필터링에 마르코프 전이확률모델과 퍼지 군집화의 개념을 접목하여, 적용범위 감소 문제와 성능 불안정성 문제를 해결한 예측적 군집화 기반 협업 필터링 기법을 제안한다. 이 기법은 첫째, 사용자 기호(Preference)의 변화를 추적하여 정적인 모델과 동적인 사용자간의 괴리 해소를 통해 성능 불안정 문제를 개선한다. 둘째, 전이확률과 군집 소속 확률에 기반한 적용범위 확장으로 적용범위 감소 문제를 개선한다. 제안하는 기법의 검증은 각각 성능 불안정성 문제와 확장성/성능 트레이드오프 문제에 대한 강건성(robustness)시험을 통해 이뤄졌다. 제안하는 기법은 기존 기법들에 비해 성능의 향상 폭은 미미하다. 또한 데이터의 변동 정도를 나타내는 지표인 표준 편차의 측면에서도 의미 있는 개선을 보이지 못하였다. 하지만, 성능의 변동 폭을 나타내는 범위의 측면에서는 기존 기법들에 비해 개선을 보였다. 첫 번째 실험에서는 모델 생성 전후의 성능 변동폭에서 51.31%의 개선을, 두 번째 실험에서는 군집 수 변화에 따른 성능 변동폭에서 36.05%의 개선을 보였다. 이는 제안하는 기법이 성능의 향상을 보여주지는 못하지만, 성능 안정성의 측면에서는 기존의 기법들을 개선하고 있음을 의미한다.

Abstract ▼ AI-Helper

With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the r

주제어

질의응답

핵심어	질문	논문에서 추출한 답변
	CBCF의 문제점은 무엇인가?	CBCF는 적용범위 감소 문제와 성능 불안정 문제에도 불구하고, CF의 확장성 문제와 희박성의 문제를 개선하는데 유용한 방법이다. 때문에 위의 두 가지 문제를 해결하기 위한 다양한 방법들이 제안되고 있다.
	모델 기반 협업 필터링은 무엇인가?	이를 위해 다양한 기법들이 제안되었는데, 비교적 도메인의 제약이 적은 협업 필터링이 널리 사용되고 있다. 협업 필터링의 한 종류인 모델 기반 협업 필터링은 기계학습이나 데이터 마이닝 모델을 협업 필터링에 접목한 방법이다. 이는 희박성 문제와 확장성 문제 등의 협업 필터링의 근본적인 한계를 개선하지만, 모델 생성 비용이 높고 성능/확장성 트레이드오프가 발생한다는 한계점을 갖는다.
	모델 기반 CF의 장점은 무엇인가?	이 중, 모델 기반 CF는 베이지안(Bayesian) 모델이나 군집화 모델, 의존성 네트워크(dependency network) 등의 모델을 사용해서 CF의 단점을 보완한 방법이다. 이는 희박성(sparsity) 문제와 확장성 문제 등을 개선하며, 예측 성능을 높일 수 있다. 하지만, 모델 생성 비용이 크고(expensive model-building) 성능과 확장성 간의 트레이드오프(trade-off)가 발생한다.

참고문헌 (28)

Ali, K. and W. Van Stam, "Tivo: Making show recommendations using a distributed collaborative filtering architecture," Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, (2004), 394-401.
Bellogin, A. and J. Parapar, "Using graph partitioning techniques for neighbor selection in user-based collaborative filtering," Proceedings of the sixth ACM conference on Recommender systems, ACM, (2012), 213-216.
Bennet, J. and S. Lanning, "The netflix prize," Proceedings of KDD Cup and Workshop, (2007). Available at http://www.netflixprize.com/ (Accessed 20 March, 2015).
Bhosale, N. S. and S. S. Pande. "A Survey on Recommendation System for Big Data Applications," Data Mining and Knowledge Engineering, Vol.7, No.1(2015), 42-44.
Bobadilla, J., F. Ortega, A. Hernando, and A. Gutierrez, "Recommender systems survey," Knowledge-Based Systems, Vol. 46(2013), 109-132.
Cho, Y.-B., and Y.-H. Cho, "Considering Customer Buying Sequences to Enhance the Quality of Collaborative Filtering," Journal of Intelligence and Information Systems, Vol.13, No.2(2007), 69-80
Das, A. S., M. Datar, A. Garg, A., and S. Rajaram, "Google news personalization: Scalable online collaborative filtering," Proceedings of the 16th international conference on World Wide Web, ACM, (2003), 271-280.
George, T., and S. Merugu, "A scalable collaborative filtering framework based on co-clustering," Proceedings of the Fifth IEEE International Conference on Data Mining, IEEE, (2005), 4.
Gong, S., "A collaborative filtering recommendation algorithm based on user clustering and item clustering," Journal of Software, Vol.5, No.7 (2010), 745-752.
Hameed, M. A., O. A. Jadaan, and S. Ramachandram, "Collaborative Filtering Based Recommendation System: A survey," International Journal on Computer Science & Engineering, Vol. 4, No.5(2012).
Im, I. and B. H. Kim, "The Effect of the Personalized Settings for CF-Based Recommender Systems," Journal of Intelligence and Information Systems, Vol.18, No.2(2012), 131-141.

원문보기 상세보기
Joshi, R. C. and R. S. Paswan, "A Survey Paper on Clustering-based Collaborative Filtering Approach to Generate Recommendations," International Journal of Science and Research, Vol.4, No.1(2015), 1395-1398.
Khoshneshin, M. and W. N. Street, "Incremental collaborative filtering via evolutionary coclustering," Proceedings of the fourth ACM conference on Recommender systems, ACM, (2010), 325-328.
Lee, J., M. Sun, and G. Lebanon, "A comparative study of collaborative filtering algorithms," arXiv preprint arXiv:1205.3193, (2012), 1-27.
Lee, O.-J., M.-S. Hong, W.-j. Lee, and J.-D. Lee, "Scalable Collaborative Filtering Technique based on Adaptive Clustering," Journal of Intelligence and Information Systems, Vol.20, No.2(2014), 73-92.

원문보기 상세보기
Lee, O.-J. and Y.-t. Baek, "Hybrid Preference Prediction Technique Using Weighting based Data Reliability for Collaborative Filtering Recommendation System," Journal of the Korea Society of Computer and Information, Vol.19, No.5 (2014), 61-69.
Renaud-Deputter, S., T. Xiong, and S. Wang, "Combining collaborative filtering and clustering for implicit recommender system," Proceedings of 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), IEEE, (2013), 748-755.
Li, Q. and Z. Dong, "Research of collaborative filtering algorithm based on the probabilistic clustering model," Proceedings of 2010 5th International Conference on Computer Science and Education (ICCSE), IEEE, (2010), 380-383.
Li, X. and T. Murata, "Using multidimensional clustering based collaborative filtering approach improving recommendation diversity," Proceedings of 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT), IEEE, Vol. 3(2012), 169-174.
Linden, G., B. Smith, and J. York, "Amazon.com recommendations: Item-to-item collaborative filtering," IEEE Internet Computing, (2003), 76-80.
Natarajan, N., D. Shin, and I. S. Dhillon, "Which app will you use next?: Collaborative filtering with interactional context," Proceedings of the 7th ACM conference on Recommender systems, ACM, (2013), 201-208.
Park, S. T. and D. M. Pennock, "Applying collaborative filtering techniques to movie search for better ranking and browsing," Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, (2007), 550-559.
Pham, M. C., Y. Cao, R. Klamma, and M. Jarke, "A Clustering Approach for Collaborative Filtering Recommendation Using Social Network Analysis," J. UCS, Vol.17, No.4 (2011), 583-604.
Su, X. and T. M. Khoshgoftaar, "A survey of collaborative filtering techniques," Advances in artificial intelligence, (2009), 4.
Tseng, K. C., C. S. Hwang, and Y. C. Su, "Using Cloud Model for Default Voting in Collaborative Filtering," Journal of Convergence Information Technology (JCIT) Vol.6, No.12 (2011), 68-74
Wen, J. and W. Zhou, "An improved item-based collaborative filtering algorithm based on clustering method," Journal of Computational Information Systems, Vol.8, No.2(2012), 571-578.
Zhirao, J., "Based on Java Technology System and Implement the Personalized Recommendations of the system," Jilin: Jilin University, 2011.
Zhou, Z., M. Sellami, W. Gaaloul, M. Barhamgi, and B. Defude, "Data providing services clustering and management for facilitating service discovery and replacement," IEEE Transactions on Automation Science and Engineering, Vol. 10, No. 4(2013), 1131-1146.

상세보기

저자의 다른 논문 :

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증