[논문]기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구

김선우; 고건우; 최원준; 정희석; 윤화묵; 최성필

doi:10.3743/kosim.2018.35.4.141

기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구
Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences 원문보기

정보관리학회지 = Journal of the Korean society for information management, v.35 no.4 = no.110, 2018년, pp.141 - 164

김선우 (경기대학교 문헌정보학과) , 고건우 (경기대학교 문헌정보학과) , 최원준 (한국과학기술정보연구원 콘텐츠큐레이션센터) , 정희석 (한국과학기술정보연구원 콘텐츠큐레이션센터) , 윤화묵 (한국과학기술정보연구원 콘텐츠큐레이션센터) , 최성필 (경기대학교 문헌정보학과)

초록
AI-Helper

최근 학술문헌의 양이 급증하고, 융복합적인 연구가 활발히 이뤄지면서 연구자들은 선행 연구에 대한 동향 분석에 어려움을 겪고 있다. 이를 해결하기 위해 우선적으로 학술논문 단위의 분류 정보가 필요하지만 국내에는 이러한 정보가 제공되는 학술 데이터베이스가 존재하지 않는다. 이에 본 연구에서는 국내 학술문헌에 대해 다중 분류가 가능한 자동 분류 시스템을 제안한다. 먼저 한국어로 기술된 기술과학 분야의 학술문헌을 수집하고 K-Means 클러스터링 기법을 활용하여 DDC 600번 대의 중분류에 맞게 매핑하여 다중 분류가 가능한 학습집합을 구축하였다. 학습집합 구축 결과, 메타데이터가 존재하지 않는 값을 제외한 총 63,915건의 한국어 기술과학 분야의 자동 분류 학습집합이 구축되었다. 이를 활용하여 심층학습 기반의 학술문헌 자동 분류 엔진을 구현하고 학습하였다. 객관적인 검증을 위해 수작업 구축한 실험집합을 통한 실험 결과, 다중 분류에 대해 78.32%의 정확도와 72.45%의 F1 성능을 얻었다.

Abstract ▼ AI-Helper

Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

주제어

참고문헌 (26)

Kim, Seon-Wu, Yu, Seok-Jong, Lee, Min-Ho, & Choi, Sung-Pil (2017). A comparative study on deep learning topology for event extraction from biomedical literature. The Journal of Korean Literature Information, 51(4), 77-97. https://doi.org/10.4275/KSLIS.2017.51.4.077

원문보기 상세보기
Kim, Seon-Wu, & Choi, Sung-Pil (2018). Research on joint models for korean word spacing and POS tagging based on bidirectional LSTM-CRF. Journal of Information Science, 45(8), 792-800.
Kim, Pan-Jun (2018). An analytical study on automatic classification of domestic journal articles based on machine learning. Information Management Journal, 35(2), 37-62. https://doi.org/10.3743/KOSIM.2018.35.2.037

원문보기 상세보기
Kim, Pan-Jun, & Lee, Jae-Yun (2014). An experimental study on the performance improvement of automatic classification for the articles of korean journals based on controlled keywords in international database. Journal of the Korean Society for Library and Information Science, 48(3), 491-510. https://doi.org/10.4275/KSLIS.2014.48.3.491

원문보기 상세보기
Ra. Dong-Yul, Kang, Hyun-Kyu, Kim, Hyun-Tae, Park, Kyung-Il, Jang, Hyeong-Il, Yeom, Sung-Wook, ... & Shin, Hyun-Ju (2007). Development of a test collection HANTEC for evaluating information retrieval.management.service. (report no. K-07-IP-02-03S-7). Korea Institute of Science and Technology Information.
Ra, Dong-Yul, Kim, Yun-Sik, Shin, Hyun-Joo, Lee, Kyu-Hee, Kim, Tae-Kyu, Kang, Hyun-Kyu, ... & Yoon, Hwa-Mook (2007). Developing a test collection for korean text categorization. Proceedings of the Korea Contents Association Conference, 5(1), 435-439.
Noh, Dae-Wook, Lee, Soo-Yong, & Ra, Dong-Yul (2007). Developing a text categorization system based on unsupervised learning using an information retrieval technique. Information Science Journal: Software and Application, 34(2), 160-168.
Park, Young-Keun, Park, Su-Bin, Park, No-il, & Lee, Hyun-Ah (2017). Web news classification using latent semantic analysis. Korea Information Science Society Academic Conference Academic Literature, 1828-1830.
Yuk, Jee-Hee, & Song, Min (2018). A study of research on methods of automated biomedical document classification using topic modeling and deep learning. The Journal of Information Management, 35(2), 63-88. https://doi.org/10.3743/KOSIM.2018.35.2.063

원문보기 상세보기
Lee, Da-Bin, & Choi, Sung-Pil (2018). In-depth comparative analysis of various korean morpheme embedding models using massive textual resource. Korea Information Science Society Academic Conference Academic Literature, 613-615.
Lee, Yong-Gu (2013). A study on the quality selection of KNN classifiers using frequency of documents and frequency of collections. Journal of Korean Library and Information Science Society, 44(1), 27-47. http://doi.org/10.16981/kliss.44.1.201303.27

원문보기 상세보기
Cho, Hyun-Soo, & Lee, Sang-Goo (2017). Korean word embedding using fasttext. Korea Information Science Society Academic Conference Academic Literature, 705-707.
Cho, Hyun-Yang (2017). A experimental study on the development of a book recommendation system using automatic classification, Based on the Personality Type. Journal of Korean Library and Information Science Society, 48(2), 215-236. http://doi.org/10.16981/kliss.48.2.201706.215
Cho, Hui-Yeol, Kim, Jin-Hwa, Yoon, Sang-Woong, Kim, Kyung-Min, & Zhang, Byung-Tak (2015). Large-scale text classification methodology with convolutional neural network. Korea Information Science Society Academic Conference Academic Literature, 792-794.
Choi, Ga-Ram, & Choi, Sung-Pil (2018). A study on the deduction of social issues applying word embedding: With an empasis on news articles related to the disables. The Journal of Information Management, 35(1), 231-250. https://doi.org/10.3743/KOSIM.2018.35.1.231

원문보기 상세보기
Choi, Sung-Pil, Yoo, Suk-Jong, & Cho, Hyun-Yang (2016). A study on the semiautomatic construction of domain-specific relation extraction datasets from biomedical abstracts - Mainly focusing on a genic interaction dataset in alzheimer's disease domain -. Journal of Korean Library and Information Science Society, 47(4), 289-307. https://doi.org/10.16981/kliss.47.4.201612.289

원문보기 상세보기
Han, Kyu-Yeol, & Ahn, Young-Min (2013). Automatic labeling of korean document clusters created by LDA. Journal of Korean Society of Information Science. Korea Information Science Society Academic Conference Academic Literature, 616-618.
Bock, H. H. (2007). Clustering methods: a history of k-means algorithms. In Selected contributions in data analysis and classification, 161-172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_15
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606.
Choi, S. P. (2018). Extraction of protein-protein interactions (PPIs) from the literature by deep convolutional neural networks with various feature embeddings. Journal of Information Science, 44(1), 60-73. https://doi.org/10.1177/0165551516673485

상세보기
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2016). Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759.
Kowsari, K., Brown, D. E., Heidarysafa, M., Meimandi, K. J., Gerber, M. S., & Barnes, L. E. (2017, December). Hdltex: Hierarchical deep learning for text classification. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on, 364-371. https://doi.org/10.1109/ICMLA.2017.0-134
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, 3111-3119.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543. http://dx.doi.org/10.3115/v1/D14-1162
Shafiabady, N., Lee, L. H., Rajkumar, R., Kallimani, V. P., Akram, N. A., & Isa, D. (2016). Using unsupervised clustering approach to train the support vector machine for text classification. Neurocomputing, 211, 4-10. https://doi.org/10.1016/j.neucom.2015.10.137

상세보기
Shinyama, Y. (2004). PDFMiner. Retrieved from https://euske.github.io/pdfminer/

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구
Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (26)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구 Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (26)

이 논문을 인용한 문헌

저자의 다른 논문 :

김선우 (2) 고건우 (1) 최원준 (3) 정희석 (4) 윤화묵 (20) 최성필 (43)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구
Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences 원문보기

초록
AI-Helper