$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구
An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning 원문보기

정보관리학회지 = Journal of the Korean society for information management, v.35 no.2 = no.108, 2018년, pp.37 - 62  

김판준 (신라대학교 문헌정보학과)

초록
AI-Helper 아이콘AI-Helper

문헌정보학 분야의 국내 학술지 논문으로 구성된 문헌집합을 대상으로 기계학습에 기초한 자동분류의 성능에 영향을 미치는 요소들을 검토하였다. 특히, "정보관리학회지"에 수록된 논문에 주제 범주를 자동 할당하는 분류 성능 측면에서 용어 가중치부여 기법, 학습집합 크기, 분류 알고리즘, 범주 할당 방법 등 주요 요소들의 특성을 다각적인 실험을 통해 살펴보았다. 결과적으로 분류 환경 및 문헌집합의 특성에 따라 각 요소를 적절하게 적용하는 것이 효과적이며, 보다 단순한 모델의 사용으로 상당히 좋은 수준의 성능을 도출할 수 있었다. 또한, 국내 학술지 논문의 분류는 특정 논문에 하나 이상의 범주를 할당하는 복수-범주 분류(multi-label classification)가 실제 환경에 부합한다고 할 수 있다. 따라서 이러한 환경을 고려하여 단순하고 빠른 분류 알고리즘과 소규모의 학습집합을 사용하는 최적의 분류 모델을 제안하였다.

Abstract AI-Helper 아이콘AI-Helper

This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the...

주제어

참고문헌 (54)

  1. Kang, Seung-Shik (2002). Korean Morphology and Information Retrieval. Hongrung Publishing Company. 

  2. Kim, Seong-Hee, & Eom, Jae-Eun (2012). A study on the documents's automatic classification using machine learning. Journal of Information Management, 39(4), 47-66. http://dx.doi.org/10.1633/JIM.2008.39.4.047 

  3. Kim, Yong-Hwan, & Chung, Young-Mee (2012). An experimental study on feature selection using wikipedia for text categorization. Journal of the Korean Society for information Management, 29(2), 155-171. http://dx.doi.Org/10.3743/KOSIM.2012.29.2.155 

  4. Kim, Jong-Min, & Yoo, Chang D. (2014). Linear classifier optimization for feature acquisition cost-sensitive classification. In Proceedings of the IEEK Conference, 37(1), 2021-2024. 

  5. Kim, Pan Jun (2006a). A study on automatic assignment of descriptors using machine learning. Journal of the Korean Society for Information Management, 23(1), 279-299. http://dx.doi.org/10.3743/KOSIM.2006.23.1.279 

  6. Kim, Pan Jun (2006b). A study on the automatic descriptor assignment for scientific journal articles uing rocchio algorithm. Journal of the Korean Society for Information Management, 23(3), 69-89. http://dx.doi.org/10.3743/KOSIM.2006.23.3.069 

  7. Kim, Pan Jun (2008). A study on the performance improvement of rocchio classifier with term weighting methods. Journal of the Korean Society for Information Management, 25(1), 211-233. http://dx.doi.org/10.3743/KOSIM.2008.25.1.211 

  8. Kim, Pan Jun (2016). An analytical study on performance factors of automatic classification based on machine learning. Journal of the Korean Society for Information Management, 33(2), 33-59. http://dx.doi.org/10.3743/KOSIM.2016.33.2.033 

  9. Kim, Pan Jun, & Lee, Jae Yun (2007). Utilizing unlabeled documents in automatic classification with inter-document similarities. Journal of the Korean Society for Information Management, 24(1), 251-271. http://dx.doi.org/10.3743/KOSIM.2007.24.1.251 

  10. Kim, Pan Jun, & Lee, Jae Yun (2012). A study on the reclassification of author keywords for automatic assignment of descriptors. Journal of the Korean Society for Information Management, 29(2), 225-246. http://dx.doi.org/10.3743/KOSIM.2012.29.2.225 

  11. Kim, Pan Jun, & Lee, Jae Yun (2014). An experimental study on the performance improvement of automatic classification for the articles of korean journals based on controlled keywords in international database. Journal of the Korean Society for Library and Information Science, 48(3), 491-510. http://dx.doi.org/10.4275/KSLIS.2014.48.3.491 

  12. Song, Sung-Jeon, & Chung, Young-Mee (2012). A study on improving the performance of document classification using the context of terms. Journal of the Korean Society for Information Management, 29(2), 205-224. http://dx.doi.Org/10.3743/KOSIM.2012.29.2.205 

  13. Shim, Kyung (2006). Optimization of number of training documents in text categorization. Journal of the Korean Society for Information Management, 23(4), 277-294. http://dx.doi.org/10.3743/KOSIM.2006.23.4.277 

  14. Shim, Kyung, & Chung, Young-Mee (2006). The effect of the quality of pre-assigned subject categories on the text categorization performance. Journal of the Korean Society for Information Management, 23(2), 265-285. http://dx.doi.org/10.3743/KOSIM.2006.23.2.265 

  15. Lee, Yong-Gu (2009). Classification performance analysis of cross-language text categorization using machine translation. Journal of the Korean Society for Library and Information Science, 43(1), 313-332. http://dx.doi.org/10.4275/kslis.2009.43.1.313 

  16. Lee, Yong-Gu (2013). A study on feature selection for kNN classifier using document frequency and collection frequency. Journal of Korean Library and Information Science Society, 44(1), 27-47. http://dx.doi.org/10.16981/kliss.44.1.201303.27 

  17. Lee, Jae Yun (2005a). Improving the performance of a fast text classifier with document-side feature selection. Journal of Information Management, 36(4), 51-69. http://dx.doi.org/10.1633/jim.2005.36.4.051 

  18. Lee, Jae Yun (2005b). An empirical study on improving the performance of text categorization considering the relationships between feature selection criteria and weighting methods. Journal of the Korean Society for Library and Information Science, 39(2), 123-146. http://dx.doi.org/10.4275/kslis.2005.39.2.123 

  19. Chung, Eun-Kyung (2009). A semantic-based feature expansion approach for improving the effectiveness of text categorization by using wordNet. Journal of the Korean Society for Information Management, 26(3), 261-278. http://dx.doi.Org/10.3743/KOSIM.2009.26.3.261 

  20. National Research Foundation of Korea (2016). Research Field Classification Scheme. Retrieved from http://www.nrf.re.kr 

  21. Korea Citation Index (2018). Retrieved from https://www.kci.go.kr 

  22. AI-Salemi, B., Aziz, M., Juzaiddin, A., & Noah, S. (2015). Boosting algorithms with topic modeling for multi-label text categorization: A comparative empirical study. Journal of Information Science, 41(5), 732-746. http://dx.doi.Org/10.1177/0165551515590079 

  23. Chen, E., Lin, Y., Xiong, H., Luo, Q., & Ma, H. (2011). Exploiting probabilistic topic models to improve text categorization under class imbalance. Information Processing and Management, 47(2), 202-214. 

  24. Chen, Yao-Tsung, & Chen, Meng Chang (2011). Using chi-square statistics to measure similarities for text categorization. Expert Systems with Application, 38(4), 3085-3090. 

  25. Dalal, M. K., & Zaveri, M. A. (2012). Automatic text classification of sports blog data, proceedings of the ieee international conference on computing, communications and applications (ComComAp 2012), Hong Kong, 11-13 January 2012, 219-222. 

  26. Dalal, M. K., & Zaveri, M. A. (2013). Automatic classification of unstructured blog text. Journal of Intelligent Learning Systems and Applications, 5(2), 108-114. http://dx.doi.Org/10.4236/jilsa.2013.52012. 

  27. Eriksson, Tobias (2013). Automatic web page categorization using text classification methods. Master's Degree Project in Computer Science CSC School of Computer Science and Communication. 

  28. Foulds, J., & Frank, E. (2010). A review of multi-instance learning assumptions. Knowl. Eng. Rev., 25(1), 1-25. 

  29. Hmeidi, I., Al-Ayyoub, M., Abdulla, N. A., Almodawar, A. A., Abooraig, R., & Mahyoub, N. A. (2015). Automatic arabic text categorization: A comprehensive comparative study. Journal of Information Science, 41(1), 114-124. https://doi.org/10.1177/0165551514558172 

  30. Jiang, S., Pang, G., Wu, M., & Kuang, L. (2012). An improved k-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1), 1503-1509. https://doi.org/10.1016/j.eswa.2011.08.040 

  31. Jindal, Rajni, Malhotra, Ruchika, & Jain, Abha. (2015). Techniques for text classification: Literature review and current trends. Webology, 12(2), 2-28. 

  32. Joorabchi, A., & Mahdi, A. E. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. Journal of Information Science, 37(5), 499-514. https://doi.org/10.1177/0165551511417785 

  33. Khan, A., Baharudin, B., & Lee, L. H. (2010). A review of machine learning algorithms for text-documents classification. Journal of Advances in Information Technology, 1(1), 4-20. https://doi.org/10.4304/jait.1.1.4-20 

  34. Kumar, M. A., & Gopal, M. (2010). A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognition Letters, 31(11), 1437-1444. https://doi.org/10.1016/j.patrec.2010.02.015 

  35. Li, C. H., & Park, S. C. (2009). An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Systems with Applications, 36(2), 3208-3215. https://doi.org/10.1016/j.eswa.2008.01.014 

  36. Liu, Y., Loh, H. T., Yousef-Toumi, K., & Tor, S. B. (2007). Handling of imbalanced data in text classification: category-based term weights. In Kao, A., & Poteet, S. R. eds. Natural Language Processing and Text Mining. Springer, 171-192. https://doi.org/10.1007/978-1-84628-754-1_10 

  37. Miao, Yun-Qian, & Kamel, Mohamed (2011). Pairwise optimized rocchio algorithm for text categorization. Pattern Recognition, 32(2), 375-382. https://doi.org/10.1016/j.patrec.2010.09.018 

  38. Pawar, P. Y., & Gawande, S. H. (2012). Comparative study on different types of approaches to text categorization. International Journal of Machine Learning and Computing, 2(4), 423-426. https://doi.org/10.7763/ijmlc.2012.v2.158 

  39. Pedregosa, F. et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825-2830. 

  40. Read, J. (2010). Scalable Multi-label Classification (Thesis, Doctor of Philosophy (PhD)). University of Waikato, Hamilton, New Zealand. Retrieved from https://hdl.handle.net/10289/4645 

  41. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85, 333-359. 

  42. Schapire, R. E., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39, 135-168. 

  43. Sebastiani, Fabrizio (2002). Machine learning in automated text categorization. ACM computing Surveys, 34(1), 1-47. https://doi.org/10.1145/505282.505283 

  44. Shehab, M. A., Badarneh, O., Al-Ayyoub, M., & Jararweh, Y. (2016). A supervised approach for multi-label classification of Arabic news articles, 7th International Conference on Computer Science and Information Technology (CSIT), Amman, 2016, 1-6. http://dx.doi.Org/10.1109/CSIT.2016.7549465 

  45. Tarrago, D. S., Cornelis, C., Bello, R., & Herrera, F. (2014). A multi-instance learning wrapper based on the Rocchio classifier for web index recommendation. Knowledge-Based Systems, 59, 173-181. https://doi.org/10.1016/j.knosys.2014.01.008 

  46. Torii, M., Yin, L., Nguyen, T., Mazumdar, C. T., Liu, H., Hartley, D. M., & Nelson, N. P. (2011). An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. International Journal of Medical Informatics, 80(1), 56-66. https://doi.org/10.1016/j.ijmedinf.2010.10.015 

  47. Tsoumakas G, Katakis I., & Vlahavas I. (2010). Mining multi-label data. In: Data mining and knowledge discovery handbook. Berlin: Springer, 667-685. 

  48. Uguz, Harun. (2011). A two-stage feature selection methods for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032. https://doi.org/10.1016/j.knosys.2011.04.014 

  49. Vasuki, Vidya, & Cohen, Trevor (2010). Reflective random indexing for semi-automatic indexing of the biomedical literature. Journal of Biomedical Informatics, 43(5), 694-700. https://doi.org/10.1016/j.jbi.2010.04.001 

  50. Villena-Roman, J., Collada-Perez, S., Lana-Serrano, S., & Gonzalez-Cristobal, J. C. (2011). Hybrid approach combining machine learning and a rule-based expert system for text categorization. In Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference, 323-328. 

  51. Vogrincic, Sergeja, & Bosnic, Zoran (2011). Ontology-based multi-label classification of economic articles. ComSIS, 8(1), 101-119. https://doi.org/10.2298/csis100420034v 

  52. Wang, Tai-Yue, & Chiang, Huei-Min (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing and Management, 43(4), 914-929. https://doi.org/10.1016/j.ipm.2006.09.011 

  53. Wu, Chih-Hung (2009). Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications, 36(1), 4321-4330. https://doi.org/10.1016/j.eswa.2008.03.002 

  54. Yu, B., Xu, Zong-ben, & Li, Cheng-hua (2008). Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems, 21(8), 900-904. https://doi.org/10.1016/j.knosys.2008.03.045 

저자의 다른 논문 :

LOADING...

관련 콘텐츠

오픈액세스(OA) 유형

BRONZE

출판사/학술단체 등이 한시적으로 특별한 프로모션 또는 일정기간 경과 후 접근을 허용하여, 출판사/학술단체 등의 사이트에서 이용 가능한 논문

저작권 관리 안내
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로