[논문]Methodology for Search Intent-based Document Recommendation

Lee, Donghoon; Kim, Namgyu

doi:10.9708/jksci.2021.26.06.115

[국내논문] Methodology for Search Intent-based Document Recommendation 원문보기

韓國컴퓨터情報學會論文誌 = Journal of the Korea Society of Computer and Information, v.26 no.6, 2021년, pp.115 - 127

Lee, Donghoon (Graduate School of Business IT, Kookmin University) , Kim, Namgyu (Graduate School of Business IT, Kookmin University)

초록
AI-Helper

방대한 데이터 가운데 사용자가 원하는 정보를 단번에 찾아내는 것은 결코 쉬운 일이 아니다. 이로 인해 사용자의 문서 열람 이력을 바탕으로 사용자 선호를 고려해 문서를 추천하는 다양한 방법들이 제안되었다. 하지만 기존에 활용된 문서 열람 이력 기반 문서 추천 방법론은 문서를 누가 열람했는지의 정보만을 활용할 뿐, 사용자가 해당 문서를 열람하게 된 의도(Intent)를 충분히 활용하지 못했다는 한계를 갖는다. 따라서 본 연구에서는 해당 문서를 누가(Who) 읽었는지의 정보가 아닌 해당 문서를 왜(Why) 읽었는지의 정보를 활용하는 검색 의도 기반 문서 추천 방안을 제시하고자 한다. 제안 방법론의 우수성을 확인하기 위해 국내 전자상거래 플랫폼 기업인 'C' 사의 실제 사용자 검색 이력 239,438건을 분석한 실험을 수행하였으며, 실험 결과 제안 방법론이 기존의 내용 기반 추천 모델 및 단순 열람 이력 기반 추천 모델에 비해 우수한 성능을 보임을 확인하였다.

Abstract ▼ AI-Helper

It is not an easy task for a user to find the correct documents that a user really wanted at once from a vast amount of the search results. For this reason, various methods of recommending documents by taking the user's preferences into consideration based on the user's document browsing history have been proposed. However, the document recommendation methodology based on the document browsing history also has a limitation that only the information the user has viewed is utilized, but the intent of the user searching for the document is not fully utilized. Therefore, we propose a document recommendation method based on the user's search intent that utilizes information on "Why" the user reads the document, instead of the information on "Who" reads the document. In order to confirm the feasibility of the proposed methodology, an experiment was conducted by analyzing 239,438 actual user's search history of one of the most popular e-commerce platform companies in Korea. As a result, our methodology showed superior performance compared to the existing content-based or simple browsing history-based recommendation model.

주제어

표/그림 (18)

그림 Fig. 1. Access History with Search Keywords
그림 Fig. 2. Comparison of User's Perspective and Keyword’s Perspective
그림 Fig. 3. Text Analytics - Techniques and Applications[23]
그림 Fig. 4. Overall Research Process
표 Table 1. History of Search Keywords and Documents Access for Each User
표 Table 2. Matrix of Documents / Users
표 Table 3. Matrix of Documents / Search Keywords
표 Table 4. IDF of Each Keyword
표 Table 5. Matrix of Documents / Intent Weight
그림 Fig. 5. Experimental Model
표 Table 6. Intent-based Documents Similarity
표 Table 7. User-based Documents Similarity
그림 Fig. 6. Overall Average Confidence from Ranking 1 to 5
표 Table 8. Performance Comparison of Three Recommendation Models
그림 Fig. 7. Average Confidence (Cosine/Absolute Frequency)
그림 Fig. 8. Average Confidence (Cosine/TF-IDF)
그림 Fig. 9. Average Confidence (Jaccard/Absolute Frequency)
그림 Fig. 10. Average Confidence (Jaccard/TF-IDF)

AI 본문요약
AI-Helper

제안 방법

따라서 제안 방법론을 고도화하고 검증하기 위해서는 실제 시스템을 운영하고 있는 주체의 참여가 수반되어야 하며, 이는 본 연구의 확장성 측면의 한계가 될 수 있다. 또한 본 연구에서는 문서 간 연관성 척도를 사용하여 다양한 문서 추천 모델의 성능을 비교하고 엄밀한 평가를 위해 학습 데이터와 검증 데이터의 기간에 차이를 두었다. 하지만 이러한 평가 방법은 문서 추천 모델의 성능을 직접적으로 평가한 것은 아니라는 한계를 갖는다.
본 연구에서는 텍스트 분석을 다룬 다양한 선행 연구의 성과를 활용하여, 사용자의 검색어 정보에 내재된 사용자의 의도를 고려한 유사 문서 추천 방안을 제안한다.
이 때, 사용자 관점의 구조화는 각 문서들을 누가(Who) 열람했는지에 초점을 두는 반면, 검색어 관점의 구조화는 각 문서들이 어떤 의도로(Why) 열람되었는지에 초점을 두는 것으로 이해할 수 있다. 일반적인 사용자 이력 기반 문서 추천이 [Table 2]의 각 사용자의 문서 접근 이력에 기반을 두어 이루어지는 것과 달리, 본 연구에서는 [Table 3]의 문서별 검색어 기반 유사 문서 추천 방식을 제안한다. [Table 3] 의 열람 문서 / 유입 검색어 행렬을 이용하여 이후 분석을 수행하는 과정은 다음 절에서 상세히 소개한다.

대상 데이터

본 실험에서는 국내 최대전자 상거래 플랫폼 기업인 ‘C’ 사의 실제 사용자 검색 이력 중 2020년 1월부터 2020년 11월까지 239,438건의 데이터를 활용하였다.
세부적으로는 사용자별 검색어 및 문서 열람 이력에서 단계 (1)의 데이터 분할을 통해 2020년 1월부터 2020년 8월까지의 이력 167,791건의 학습 셋, 그리고 2020년 9 월부터 2020년 11월까지의 이력 71,647건의 테스트 셋을 구축하였다. 단계 (2) ~ (4)에서는 모델 간 성능의 비교를 위해 의도 기반, 내용 기반, 그리고 사용자 기반의 문서 추천 모델을 생성하였다.

데이터처리

이러한 세 가지 추천 모델은 동일한 문서에 대해서도 각자의 알고리즘에 따라 서로 다른 문서를 연관 문서로 추천하게 된다. 여러 추천 모델의 정확성을 파악하기 위해 본 실험에서는 성능 평가 기준으로 연관분석에서 주로 사용되는 신뢰도를 채택하였으며, 문서 간 신뢰도는 단계 (5)에서 산출하였다. 신뢰도란 특정 사건 A가 발생했을 때 사건 B도 함께 발생할 확률인 조건부 확률로 계산된다.

이론/모형

일반적으로 각 문서에 대해 유입 빈도가 높은 검색어가 해당 문서가 갖는 내용을 의미있게 대표한다고 해석할 수 있다. 하지만 고빈도 단어가 항상 해당 문서의 의미를 잘 대표하지는 않는다는 결과가 많은 선행 연구를 통해 알려졌기 때문에, 본 연구에서는 검색어의 단순 유입 빈도가 아닌 TF-IDF 가중 빈도를 구조화에 사용한다.

성능/효과

전통적으로 사용자의 문서 열람 이력을 바탕으로 사용자 선호를 고려해 문서를 추천하는 다양한 방법들이 제안되었으나, 이러한 접근법은 문서를 누가 열람했는지의 정보만을 활용할 뿐, 사용자가 해당 문서를 열람하게 된 의도를 충분히 활용하지 못했다는 한계를 갖는다. 따라서 본 연구에서는 사용자가 문서 검색에 사용한 검색어를 활용하여, 사용자의 검색 의도에 기반을 둔 문서 추천방안을 새롭게 제시하였다. 또한 제안 방법론의 실무적 활용 가능성을 판단하기 위해 국내 전자상거래 플랫폼 기업인 ‘C’ 사의 실제 사용자 검색 이력 239, 438건을 분석한 실험을 수행하였으며, 실험 결과 제안 방법론이 기존의 내용 기반 추천 모델 및 단순 열람 이력 기반 추천 모델에 비해 우수한 성능을 보임을 확인하였다.
즉 검색어는 열람 문서에 비해 사용자의 정보 검색 의도를 더욱 직접적으로 담고 있으므로, 향후 사용자가 입력한 검색어와 사용자가 열람한 문서의 관계를 분석하는 방식의 많은 후속 연구가 이루어질 것으로 기대한다. 또한 실험을 통해 본 연구에서 제안하는 의도기반 추천 모델이 기존의 내용 기반, 혹은 사용자 이력 기반 추천 모델에 비해 평균 신뢰도 측면에서 우수한 성능을 보임을 확인하였으며, 이는 본 연구의 실무적 기여로 인정받을 수 있다.
또한 제안 방법론의 실무적 활용 가능성을 판단하기 위해 국내 전자상거래 플랫폼 기업인 ‘C’ 사의 실제 사용자 검색 이력 239, 438건을 분석한 실험을 수행하였으며, 실험 결과 제안 방법론이 기존의 내용 기반 추천 모델 및 단순 열람 이력 기반 추천 모델에 비해 우수한 성능을 보임을 확인하였다.

후속연구

우선 본 연구에서 문서추천을 위해 사용한 검색어 정보는 해당 시스템의 관리 및 운영 권한을 갖지 않은 일반 사용자가 획득하기에는 어려움이 있다. 따라서 제안 방법론을 고도화하고 검증하기 위해서는 실제 시스템을 운영하고 있는 주체의 참여가 수반되어야 하며, 이는 본 연구의 확장성 측면의 한계가 될 수 있다. 또한 본 연구에서는 문서 간 연관성 척도를 사용하여 다양한 문서 추천 모델의 성능을 비교하고 엄밀한 평가를 위해 학습 데이터와 검증 데이터의 기간에 차이를 두었다.
하지만 이러한 평가 방법은 문서 추천 모델의 성능을 직접적으로 평가한 것은 아니라는 한계를 갖는다. 따라서 향후 연구에서는 제안 모델과 비교 모델을 실제 시스템에 적용하여 각 모델의 추천 성능 및 사용자 만족도 향상 정도를 분석할 필요가 있다.
본 연구의 기여는 다음과 같다. 우선 본 연구는 기존의 내용 기반 혹은 사용자 이력 기반 유사 문서 식별 외에, 문서 열람을 발생시킨 유입 키워드를 활용한 유사 문서식별 방안을 새롭게 제시했다는 점에서 학술적 기여를 인정받을 수 있다. 즉 검색어는 열람 문서에 비해 사용자의 정보 검색 의도를 더욱 직접적으로 담고 있으므로, 향후 사용자가 입력한 검색어와 사용자가 열람한 문서의 관계를 분석하는 방식의 많은 후속 연구가 이루어질 것으로 기대한다.
사용자들이 방대한 데이터로부터 원하는 정보를 수월하게 획득할 수 있도록 지원하기 위해, 사용자가 접근한 문서와 관련있는 문서를 추천하는 연구들이 다수 수행되었다. 전통적으로 사용자의 문서 열람 이력을 바탕으로 사용자 선호를 고려해 문서를 추천하는 다양한 방법들이 제안되었으나, 이러한 접근법은 문서를 누가 열람했는지의 정보만을 활용할 뿐, 사용자가 해당 문서를 열람하게 된 의도를 충분히 활용하지 못했다는 한계를 갖는다. 따라서 본 연구에서는 사용자가 문서 검색에 사용한 검색어를 활용하여, 사용자의 검색 의도에 기반을 둔 문서 추천방안을 새롭게 제시하였다.
우선 본 연구는 기존의 내용 기반 혹은 사용자 이력 기반 유사 문서 식별 외에, 문서 열람을 발생시킨 유입 키워드를 활용한 유사 문서식별 방안을 새롭게 제시했다는 점에서 학술적 기여를 인정받을 수 있다. 즉 검색어는 열람 문서에 비해 사용자의 정보 검색 의도를 더욱 직접적으로 담고 있으므로, 향후 사용자가 입력한 검색어와 사용자가 열람한 문서의 관계를 분석하는 방식의 많은 후속 연구가 이루어질 것으로 기대한다. 또한 실험을 통해 본 연구에서 제안하는 의도기반 추천 모델이 기존의 내용 기반, 혹은 사용자 이력 기반 추천 모델에 비해 평균 신뢰도 측면에서 우수한 성능을 보임을 확인하였으며, 이는 본 연구의 실무적 기여로 인정받을 수 있다.
또한 본 연구에서는 문서 간 연관성 척도를 사용하여 다양한 문서 추천 모델의 성능을 비교하고 엄밀한 평가를 위해 학습 데이터와 검증 데이터의 기간에 차이를 두었다. 하지만 이러한 평가 방법은 문서 추천 모델의 성능을 직접적으로 평가한 것은 아니라는 한계를 갖는다. 따라서 향후 연구에서는 제안 모델과 비교 모델을 실제 시스템에 적용하여 각 모델의 추천 성능 및 사용자 만족도 향상 정도를 분석할 필요가 있다.

참고문헌 (39)

D. Reinsel, J. Gantz, and J. Rydning, Data Age 2025: The Evolution of Data to Life-Critica, https://www.import.io/wp-content/uploads/2017/04/Seagate-WP-DataAge2025-March-2017.pdf
A. Lee, K. Choi, and G. Kim, "LDA Topic Modeling and Recommendation of Similar Patent Document Using Word2vec," Information Systems Review, Vol. 22, No. 1, pp. 17-31, Feb. 2020.

원문보기 상세보기
J. Kim, J. Byun, D. Sun, T. Kim, and Y. Kim, "A Model for Measuring the R&D Project Similarity using Patent Information," Journal of the Korea Institute of Information and Communication Engineering, Vol. 18, No. 5, pp. 1013-1021, May. 2014. DOI: 10.6109/JKIICE.2014.18.5.1013

원문보기 상세보기
Y. Bai and S. Park, "LEXAI : Legal Document Similarity Analysis Service using Explainable AI," Journal of Computing Science and Engineering, Vol. 47, No. 11, pp. 1061-1070, Nov. 2020. DOI: 10.5626/JOK.2020.47.11.1061

상세보기
H. Lee and J. Kim, "Issue Keyword Extraction Method Using Document Similarity Method-Focused on Internet Articles -," Asia-pacific Journal of Multimedia services convergent with Art, Humanities, and Sociology, Vol. 7, No. 8, pp. 383-391, Aug. 2017. DOI: 10.35873/ajmahs.2017.7.8.035

상세보기
J. Kim, J. Suh, D. Ahn, and Y. Cho, "A Personalized Recommendation Methodology based on Collaborative Filtering," Journal of Intelligence and Information Systems, Vol. 8, No. 2, pp. 139-157, Dec. 2002.
J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedi, "GroupLens: applying collaborative filtering to Usenet news," Communications of the ACM, Vol. 40, No. 3, pp. 77-87, Mar. 1997. DOI: 10.1145/245108.245126

상세보기
S. Lee, Y. Cho, J. Lee, and D. Yu, "Comparative study of recommender systems using movie rating data," Journal of the Korean Data And Information Science Sociaty, Vol. 31, No. 6, pp. 975-991, Nov. 2020. DOI: 10.7465/jkdi.2020.31.6.975

상세보기
Y. Yoo, J. Kim, B. Sohn, and J. Jung, "Evaluation of Collaborative Filtering Methods for Developing Online Music Contents Recommendation System," The Transactions of The Korean Institute of Electrical Engineers, Vol. 66, No. 7, pp. 1083-1091, Jul. 2017. DOI: 10.5370/KIEE.2017.66.7.1083

원문보기 상세보기
T. Shin, K. Chang, and Y. Park, "Customer Recommendation Using Customer Preference Estimation Model and Collaborative Filtering," Korea Intelligent Information Systems Society, Vol. 12, No. 4, pp. 1-14, Dec. 2006.
R. Agrawal, T. Imielinski, and A. Swami, "Mining association rules between sets of items in large databases," Proceedings of the 1993 ACM SIGMOD international conference on Management of data, pp. 207-216, New York, NY, USA, Jan. 1993. DOI: 10.1145/170035.170072
D. Lee, "A Regression-Model-based Method for Combining Interestingness Measures of Association Rule Mining," Journal of Intelligence and Information Systems, Vol. 23, No. 6, pp.127-141, Mar. 2017. DOI: 10.13088/JIIS.2017.23.1.127

원문보기 상세보기
C. D. Manning, P. Raghavan, and H. Schutze, "Introduction to Information Retrieval," Cambridge University Press, pp. 1-506, 2008.
R. Baeza-Yates and B. Ribeiro-Neto, "Modern Inormation Retrieval: The concepts and technology behind search (2nd. ed.)," Addison-Wesley Publishing Company, pp. 1-913, 2011.
J. Son, S. Kim, H. Kim, and S. Cho, "Review and Analysis of Recommender Systems," Journal of Korean Institute of Industrial Engineers, Vol. 41, No. 2, pp. 185-208, Apr. 2015.

원문보기 상세보기
C. H. Cai, A. W. C. Fu, C. H. Cheng, and W. W. Kwong, "Mining association rules with weighted items," Proceedings. IDEAS'98. International Database Engineering and Applications Symposium, pp. 68-77, Cardiff, UK, Jul. 1998. DOI: 10.1109/IDEAS.1998.694360
D. Lee, "A Study on the Improvement of Recommendation Accuracy by Using Category Association Rule Mining," Journal of Intelligence and Information Systems, Vol. 26, No. 2, pp. 27-42, Jun. 2020. DOI: 10.13088/JIIS.2020.26.2.027

원문보기 상세보기
Y. Wu and A. Chen, "Index structures of user profiles for efficient web page filtering services," Proceedings 20th IEEE International Conference on Distributed Computing Systems, p. 644-651, Taipei, Taiwan, Apr. 2000. DOI: 10.1109/ICDCS.2000.840981
D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, "Using collaborative filtering to weave an information tapestry," Communications of the ACM, Vol. 35, No. 12, pp. 61-70. Dec. 1992. DOI: 10.1145/138859.138867

상세보기
Y. Cho and J. Kim, "Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce," Expert Systems with Applications, Vol. 26, No. 2, pp. 233-246, Feb. 2004. DOI: 10.1016/s0957-4174(03)00138-6

상세보기
Y. Cho, J. Kim, and S. Kim, "A personalized recommender system based on web usage mining and decision tree induction," Expert Systems with Applications, Vol. 23, No. 3, pp. 329-342, Oct. 2002. DOI: 10.1016/s0957-4174(02)00052-0

상세보기
H. Choi and E. Hwang, "Emotion-based Music Recommendation System based on Twitter Document Analysis," KIISE Transactions on Computing Practices, Vol. 18, No. 11, pp. 762-767, Nov. 2012.
N. Kim, D. Lee, H. Choi, and W. X. S. Wong, "Investigations on Techniques and Applications of Text Analytics," The Journal of Korean Institute of Communications and Information Sciences, Vol. 42, No. 2, pp. 471-492, Feb. 2017. DOI: 10.7840/kics.2017.42.2.471

원문보기 상세보기
G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Communications of the ACM, Vol. 18, No. 11, pp. 613-620, Nov. 1975. DOI: 10.1145/361219.361220

상세보기
G. Salton, "The SMART Retrieval System-Experiments in Automatic Document Processing," Prentice Hall, pp. 1-556, 1971.
K. Pearson, "On lines and planes of closest fit to systems of point in space," The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science Series 6, Vol. 2, No. 11, pp. 559-572, Nov. 1901. DOI: 10.1080/14786440109462720

상세보기
H. Hotelling, "Analysis of a complex of statistical variables into principal components," Journal of Educational Psychology, Vol. 24, No. 6, pp. 417-441, Sep. 1933. DOI: 10.1037/h0071325

상세보기
G. W. Stewart, "On the early history of the singular value decomposition," SIAM Review, Vol. 35, No. 4, pp. 551-566, Dec. 1993. DOI: 10.1137/1035134

상세보기
D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, Vol. 401, No. 6755, pp. 788-791, Oct. 1999. DOI: 10.1038/44565

상세보기
O. Park and H. Park, "A Study on the International Research Trends in Electronic Records Management: InterPARES 3 and ITrust Achievements," Journal of Korean Society of Archives and Records Management, Vol. 16, No. 1, pp. 89-120, Feb. 2016. DOI: 10.14404/JKSARM.2016.16.1.089

원문보기 상세보기
W. Seo, H. Park, and J. Yoon, "An exploratory study on the korean national R&D trends using co-word analysis," Journal of Information Technology Applications & Management, Vol. 19, No. 4, pp. 1-18, Dec. 2012. DOI: 10.21219/JITAM.2012.19.4.001

원문보기 상세보기
H. Choi and H. Varian, "Predicting the present with google trends," Econ. Record, Vol. 88, No. 1, pp. 2-9, Jun. 2012. DOI: 10.1111/j.1475-4932.2012.00809.x

상세보기
V. Vapnik, "Estimation of Dependences Based on Empirical Data," Springer Verlag, pp. 1-523, 1982.
J. Seo, T. Shon, J. Seo, and J. Moon, "A study on the filtering of spam e-mail using n-Gram indexing and support vector machine," Journal of the Korea Institute of Information Security & Cryptology, Vol. 14, No. 2, pp. 23-33, Apr. 2004.
D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, Vol. 3, pp. 993-1022, Jan. 2003.
J. Macqueen, "Some methods for classification and analysis of multivariate observations," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, Berkeley, USA, Jun. 1967.
G. Salton and M. J. McGill, "Introduction to modern information retrieval," McGraw-Hill, pp. 1-448, 1983.
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: an open architecture for collaborative filtering of netnews," Proceedings of the 1994 ACM conference on Computer supported cooperative work, pp.175-186, New York, NY, USA, Oct. 1994. DOI: 10.1145/192844.192905
J. L. Herlocker, J. A. Konstan, L. G. Terveen, and J. T. Riedl, "Evaluating Collaborative Filtering Recommender Systems," ACM Transactions on Information Systems, Vol. 22, No. 1, pp. 5-53, Jan. 2004. DOI: 10.1145/963770.963772

상세보기

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증