[논문]딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석

구선민; 박찬준; 소아람; 임희석

doi:10.15207/jkcs.2021.12.12.065

초록
AI-Helper

최근 기계 번역 기술과 자동 노이즈 생성 방법론을 기반으로 한국어 맞춤법 교정 연구가 활발히 이루어지고 있다. 해당 방법론들은 노이즈를 생성하여 학습 셋과 데이터 셋으로 사용한다. 이는 학습에 사용된 노이즈 외의 노이즈가 테스트 셋에 포함될 가능성이 낮아 정확한 성능 측정이 어렵다는 한계점이 존재한다. 또한 실제적인 오류 유형 분류 기준이 없어 연구마다 사용하는 오류 유형이 다르므로 질적 분석에 어려움을 겪고 있다. 이를 해결하기 위해 본 논문은 딥러닝 기반 한국어 맞춤법 교정 연구를 위한 새로운 '오류 유형 분류 체계'를 제안하며 이를 바탕으로 기존 상용화 한국어 맞춤법 교정기(시스템 A, 시스템 B, 시스템 C)에 대한 오류 분석을 수행하였다. 분석결과, 세 가지 교정 시스템들이 띄어쓰기 오류 외에 본 논문에서 제시한 다른 오류 유형은 교정을 잘 수행하지 못했으며 어순 오류나 시제 오류의 경우 오류 인식을 거의 하지 못함을 알 수 있었다.

Abstract ▼ AI-Helper

Recently, studies on Korean spelling correction have been actively conducted based on machine translation and automatic noise generation. These methods generate noise and use as train and data set. This has limitation in that it is difficult to accurately measure performance because it is unlikely t...

Recently, studies on Korean spelling correction have been actively conducted based on machine translation and automatic noise generation. These methods generate noise and use as train and data set. This has limitation in that it is difficult to accurately measure performance because it is unlikely that noise other than the noise used for learning is included in the test set In addition, there is no practical error type standard, so the type of error used in each study is different, making qualitative analysis difficult. This paper proposes new 'error type classification' for deep learning-based Korean spelling correction research, and error analysis perform on existing commercialized Korean spelling correctors (System A, B, C). As a result of analysis, it was found the three correction systems did not perform well in correcting other error types presented in this paper other than spacing, and hardly recognized errors in word order or tense.

주제어

표/그림 (5)

표 Table 1. Error Type Classification Scheme
표 Table 2. Qualitative Analysis Example Table; Correct answer correction rate is the number of correct corrections for the number of detected errors included in the error sentence
표 Table 3. Korean spelling correction by error type; Error recognition and performance are indicated by slice. 0 indicates no success. 1 indicates success. 2 indicates intermittent success
표 Table 4. Number of corrections by model. Total error indicates the number of errors included in the entire dataset statement. Total recognition means the total number of errors recognized by the system. Total Correct Correction is the total number of correct errors the system corrects
표 Table 5. Performance of corrector model. Recognition indicates the rate at which the system detects errors included in the sentence. correct correction rate indicates the correct correction rate for detected errors

참고문헌 (17)

J. Xiong, Q. Zhang, S. Zhang, J. Hou & X. Cheng. (2015, June). HANSpeller: a unified framework for Chinese spelling correction. In International Journal of Computational Linguistics & Chinese Language Processing, Volume 20, Number 1, June 2015-Special Issue on Chinese as a Foreign Language.
M. Kim, J. Jin, H. C. Kwon & A. Yoon. (2013, December). Statistical context-sensitive spelling correction using typing error rate. In 2013 IEEE 16th International Conference on Computational Science and Engineering (pp. 1242-1246).
J. H. Lee, M. Kim & H. C. Kwon. (2017). Improved statistical language model for context-sensitive spelling error candidates. Journal of Korea Multimedia Society, 20(2), 371-381.

원문보기 상세보기
C. Park, K. Kim, Y. Yang, M. Kang & H. Lim. (2020). Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimedia Tools and Applications, 1-18.
M. Lee, H. Shin, D. Lee & S. P Choi. (2021). Korean Grammatical Error Correction Based on Transformer with Copying Mechanisms and Grammatical Noise Implantation Methods. Sensors, 21(8), 2658.

상세보기
C. Park, S. Park & H. Lim. (2020). Self-Supervised Korean Spelling Correction via Denoising Transformer. 7th International Conference on Information, System and Convergence Applications
C. Park, J. Seo, S. Lee, C. Lee, H. Moon, S. Eo & H. S. Lim. (2021, August). BTS: Back TranScription for speech-to-text post-processor using text-to-speech-to-text. In Proceedings of the 8th Workshop on Asian Translation (WAT2021) (pp. 106-116).
J. Byun, H. C. Rim & S. Y. Park. (2007, August). Automatic spelling correction rule extraction and application for spoken-style korean text. In Sixth International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007) (pp. 195-199). IEEE.
E. Brill & R. C. Moore. (2000, October). An improved error model for noisy channel spelling correction. In Proceedings of the 38th annual meeting of the association for computational linguistics (pp. 286-293).
M. Konchady. (2009). Detecting Grammatical Errors in Text using a Ngram-based Ruleset. Retrieved October, 6, 2011.
Li, H., Wang, Y., Liu, X., Sheng, Z., & Wei, S. (2018). Spelling error correction using a nested rnn model and pseudo training data. arXiv preprint arXiv:1811.00238.
A. Solyman, Z. Wang & Q. Tao. (2019, September). Proposed model for arabic grammar error correction based on convolutional neural network. In 2019 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE) (pp. 1-6). IEEE.
A. Kuznetsov & H. Urdiales. (2021). Spelling Correction with Denoising Transformer. arXiv preprint arXiv:2105.05977.
J. H. Min, S. J. Jung, S. H. Jung, S. Yang, J. S. Cho & S. H. Kim. (2020). Grammatical Error Correction Models for Korean Language via Pre-trained Denoising. Quantitative Bio-Science, 39(1), 17-24.
M. Lee, H. Shin, D. Lee & S. P. Choi. (2021). Korean Grammatical Error Correction Based on Transformer with Copying Mechanisms and Grammatical Noise Implantation Methods. Sensors, 21(8), 2658.

상세보기
S. K. Kim, T. Y. Kim, R. W. Kang & J. Kim. (2020). Characteristics of Korean Liaison Rule in the Reading and Writing of Children of Korean-Vietnamese Multicultural Families and the Correlation with Mothers' Korean Abilities. Korean Speech-Lang. Hear. Assoc. 29, 57-71.
K. Lee. (2018). Patterns of Word Spacing Errors in University Students' Writing. J. Res. Soc. Lang. Lit. 97, 289-318.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석
Classification and analysis of error types for deep learning-based Korean spelling correction 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (5)

표/그림 (5)

참고문헌 (17)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석 Classification and analysis of error types for deep learning-based Korean spelling correction 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (5) 모든 표/그림 보기

표/그림 (5) 슬라이드로 보기

참고문헌 (17)

이 논문을 인용한 문헌

저자의 다른 논문 :

소아람 (1) 임희석 (82)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

딥러닝 기반 한국어 맞춤법 교정을 위한 오류 유형 분류 및 분석
Classification and analysis of error types for deep learning-based Korean spelling correction 원문보기

초록
AI-Helper

표/그림 (5)

표/그림 (5)