[논문]Sequence-to-sequence 기반 한국어 형태소 분석 및 품사 태깅

이건일; 이의현; 이종혁

doi:10.5626/jok.2017.44.1.57

Sequence-to-sequence 기반 한국어 형태소 분석 및 품사 태깅
Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

정보과학회논문지 = Journal of KIISE, v.44 no.1, 2017년, pp.57 - 62

이건일 (포항공과대학교 컴퓨터공학과) , 이의현 (포항공과대학교 컴퓨터공학과) , 이종혁 (포항공과대학교 컴퓨터공학과)

초록
AI-Helper

기존의 전통적인 한국어 형태소 분석 및 품사 태깅 방법론은 먼저 형태소 후보들을 생성한 뒤 수많은 조합에서 최적의 확률을 가지는 품사 태깅 결과를 구하는 두 단계를 거치며 추가적으로 형태소의 접속 사전, 기분석 사전 및 원형복원 사전 등을 필요로 한다. 본 연구는 기존의 두 단계 방법론에서 벗어나 심층학습 모델의 일종인 sequence-to-sequence 모델을 이용하여 한국어 형태소 분석 및 품사 태깅을 추가 언어자원에 의존하지 않는 end-to-end 방식으로 접근하였다. 또한 형태소 분석 및 품사 태깅 과정은 어순변화가 일어나지 않는 특수한 시퀀스 변환과정이라는 점을 반영하여 음성인식분야에서 주로 사용되는 합성곱 자질을 이용하였다. 세종말뭉치에 대한 실험결과 합성곱 자질을 사용하지 않을 경우 97.15%의 형태소 단위 f1-score, 95.33%의 어절단위 정확도, 60.62%의 문장단위 정확도를 보여주었고, 합성곱 자질을 사용할 경우 96.91%의 형태소 단위 f1-score, 95.40%의 어절단위 정확도, 60.62%의 문장단위 정확도를 보여주었다.

Abstract ▼ AI-Helper

Traditional Korean morphological analysis and POS tagging methods usually consist of two steps: 1 Generat hypotheses of all possible combinations of morphemes for given input, 2 Perform POS tagging search optimal result. require additional resource dictionaries and step could error to the step. In this paper, we tried to solve this problem end-to-end fashion using sequence-to-sequence model convolutional features. Experiment results Sejong corpus sour approach achieved 97.15% F1-score on morpheme level, 95.33% and 60.62% precision on word and sentence level, respectively; s96.91% F1-score on morpheme level, 95.40% and 60.62% precision on word and sentence level, respectively.

주제어

참고문헌 (14)

Sutskever, I., Vinyals, O., & Le, Q. V., "Sequence to sequence learning with neural networks," advances in Neural Information Processing Systems, pp. 3104-3112, 2014.
Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation," Empirical Methods on Natural Language Processing, pp. 1724-1734, 2014.
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio, "On the properties of neural machine translation: Encoder-decoder approaches," Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103-111, 2014.
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, "Neural machine translation by jointly learning to align and translate," Intenational Conference on Learning Representations, 2015.
Chung, J., Cho, K., & Bengio, Y., "A Character-level Decoder without Explicit Segmentation for Neural Machine Translation," arXiv preprint arXiv:1603.06147, 2015.
Vinyals, O., Kaiser, Ł., Koo, T., Petrov, S., Sutskever, I., & Hinton, G., "Grammar as a foreign language," advances in Neural Information Processing Systems, pp. 2755-2763, 2015.
F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard, D. Warde- Farley and Y. Bengio, "Theano: new features and speed improvements," NIPS 2012 deep learning workshop, 2012.
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley and Y. Bengio, "Theano: A CPU and GPU Math Expression Compiler," SCIPY, 2010.
https://github.com/nyu-dl/dl4mt-cdec
Seung-Hoon Na, Sangkeun Jung, "Deep Learning for Korean POS Tagging," Proc. of the 41st KIISE Conference, pp. 426-428, 2014. (in Korean)
Seung-Hoon Na, Young-Kil Kim, "Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging," Proc. of the 41st KIISE Conference, pp. 571-573, 2014. (in Korean)
Changki Lee, "Joint Models for Korean Word Spacing and POS Tagging using Structural SVM," Journal of KISS : Software and Applications, Vol. 40, No. 12, pp. 826-832, 2013. (in Korean)
Chorowski, J. K., Bahdanau, D., Serdyuk, D., Cho, K., & Bengio, Y., "Attention-based models for speech recognition," NIPS, 2015.
Bahdanau, D., Chorowski, J., Serdyuk, D., & Bengio, Y., "End-to-end attention-based large vocabulary speech recognition," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945-4949, 2016.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Sequence-to-sequence 기반 한국어 형태소 분석 및 품사 태깅
Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Sequence-to-sequence 기반 한국어 형태소 분석 및 품사 태깅 Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

이건일 (1) 이종혁 (24)

관련 콘텐츠

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

Sequence-to-sequence 기반 한국어 형태소 분석 및 품사 태깅
Sequence-to-sequence based Morphological Analysis and Part-Of-Speech Tagging for Korean Language with Convolutional Features

초록
AI-Helper