[논문]Word2vec을 활용한 RNN기반의 문서 분류에 관한 연구

김정미; 이주홍

doi:10.5391/jkiis.2017.27.6.560

Word2vec을 활용한 RNN기반의 문서 분류에 관한 연구
Text Document Classification Based on Recurrent Neural Network Using Word2vec 원문보기

한국지능시스템학회 논문지 = Journal of Korean institute of intelligent systems, v.27 no.6, 2017년, pp.560 - 565

김정미 (인하대학교 컴퓨터.정보공학과) , 이주홍 (인하대학교 컴퓨터.정보공학과)

초록
AI-Helper

자연어 처리 분야에서도 심층 신경망 기술이 주목되고 있으며, 최근에는 convolutional neural network (CNN)기반의 심층 신경망 구조가 이미지 분류뿐만 아니라 자연어 처리의 문서 분류에서도 좋은 성능이 입증되었다. 하지만 convolutional neural network (CNN)을 이용한 문서 분류 연구에서는 문장의 평균 단어 수가 16개로 이루어진 짧은 문장에 한하여 적용되었으며, 구문 전체와 의미론적 관계가 복잡한 전체 문장을 다루기 어렵다는 단점을 가지고 있다. 본 논문은 기존 연구의 한계점을 극복하고 더 정확한 문서 분류 성능을 위하여 word2vec를 활용한 recurrent neural network (RNN)기반의 심층 신경망의 접근법을 새롭게 제안한다. 이를 위해 장기 의존성 문제를 해결한 long short-term memory (LSTM)을 사용하여 긴 시퀀스의 입력에서도 효과적인 문서 분류가 가능하도록 하였고, 제안 방식의 효율성을 검증하기 위해 영문 데이터 뿐 아니라 한국어 영화 리뷰 데이터에 대해서도 실험을 수행하였다. 그 결과 장문을 포함하고 있는 영문 신문 기사에서는 87%, 단문으로 구성된 영문 영화 리뷰 문서에서는 90%, 한국어 영화 리뷰에서는 88%의 문서 분류 정확도를 보였다.

Abstract ▼ AI-Helper

Deep neural network based methods have obtained remarkable progress on natural language processing (NLP) task. Recently, convolutional neural network (CNN) based approaches often outperform not only in image classification, but also in document classification. However, convolutional neural network (CNN) based methods is applied only to a short sentence composed of 16 words in average, and it has a disadvantage that it is difficult to deal with a sentence having a complicated semantic relationship with the whole sentence. In this paper, we propose a new method based on recurrent neural network (RNN) using word2vec to overcome the limitations of previous related work and to get much higher accuracy of document classification. By using long short-term memory (LSTM) to solve the long-term dependency problem, effective document classification is also possible for long sequence input. To validate performance of our proposed method in various data, we tested our proposed method both with English sentence and Korean movie review dataset. As a result, 87% of the English newspaper articles containing the long texts, 90% of the English movie review and 88% of the Korean movie review showed the accuracy of document classification.

주제어

참고문헌 (20)

KIM, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882, 2014.
Wang, Peng, et al. "Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification." Neurocomputing vol. 174, no. 1, pp.806-814, 2016

상세보기
Dowoo Kim, Myoung-Wan Koo. "Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec", Journal of KIISE, vol.44. no. 7, pp.742-747, 2017

원문보기 상세보기
In-Su Kang. A Comparative Study on Using SentiWordNet for English Twitter Sentiment Analysis. Journal of Korean Institute of Intelligent Systems, vol. 23, no. 4 , pp. 317-324, 2013.

원문보기 상세보기
Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781, 2013.
Mei-Ying Ren, Sinjae Kang. "Comparison Between Optimal Features of Korean and Chinese for Text Classification." Joural of Korean Institute of Intelligent Systems, vol. 25, no. 4, pp. 386-391, 2015.

원문보기 상세보기
Dong-Wook Lee, Seo-Hyeon Baek, Min-Ji Park, JinHee Park, Hye-Wuk Jung, Jee-Hyong Lee. "Document Summarization Using Mutual Recommendation with LSA and Sense Analysis." Journal of Korean Institute of Intelligent Systems, vol. 22, no. 5 , pp. 656-662, 2012

원문보기 상세보기
Sunghae Jun. "A Big Data Preprocessing using Statistical Text Mining." Journal of Korean Institute of Intelligent Systems, vol. 25, no. 5, pp. 470-476, 2015

원문보기 상세보기
Recurrent Neural Network(RNN) Tutorial-Part1, "Team AI Korea", Available: http://aikorea.org/blog/rnn-tutorial-1/, 2015, [Accessed: July 26 2017]
Hochreiter,S. & Schmidhuber, J. "Long short-term memory" Neural computation, vol. 9, no. 8 , pp. 1735-1780, 1997

상세보기
Su Jeong Choi, Seong-Bae Park. "Categorization of POIs Using Word and Context information." Journal of Korean Institute of Intelligent Systems, vol, 24, no. 5, pp. 470-476, 2014

원문보기 상세보기
Kingma, Diederik, and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980, 2014.
mmihaltz, "word2vec-GoogleNews-vectors", Available:ttps://github.com/mmihaltz/word2vec-GoogleNews-vectors, 2016, [Accessed: July 2 2017]
"The 20 Newsgroups data set", Available: http://qwone.com/-jason/20Newsgroups/ 2008, [Accessed: March 9, 2017]
"Naver sentiment movie corpus v1.0", Available: https://github.com/e9t/nsmc 2015, [Accessed: July 9, 2017]
"Stanford Sentiment Treebank", Available: https://nlp.stanford.edu/sentiment/ 2011, [Accessed: July 20, 2017]
Genkin, Alexander, David D. Lewis, and David Madigan. "Large-scale Bayesian logistic regression for text categorization." Technometrics vol. 49, no. 3, pp. 291-304, 2007

상세보기
Drucker, Harris, Donghui Wu, and Vladimir N. Vapnik. "Support vector machines for spam categorization." IEEE Transactions on Neural networks, vol. 10, no.5, pp. 1048-1054, 1999

상세보기
BLEI, David M.; NG, Andrew Y.; JORDAN, Michael I. "Latent dirichlet allocation", Advances in neural information processing systems, 2002
Lai, Siwei, et al. "Recurrent Convolutional Neural Networks for Text Classification." AAAI, vol. 333, no. 1, 2015.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Word2vec을 활용한 RNN기반의 문서 분류에 관한 연구
Text Document Classification Based on Recurrent Neural Network Using Word2vec 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (20)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Word2vec을 활용한 RNN기반의 문서 분류에 관한 연구 Text Document Classification Based on Recurrent Neural Network Using Word2vec 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (20)

이 논문을 인용한 문헌

저자의 다른 논문 :

이주홍 (36)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

Word2vec을 활용한 RNN기반의 문서 분류에 관한 연구
Text Document Classification Based on Recurrent Neural Network Using Word2vec 원문보기

초록
AI-Helper