[논문]어절 N-gram을 이용한 문맥의존 철자오류 교정

김민호; 권혁철; 최성기

doi:10.5626/jok.2014.41.12.1081

어절 N-gram을 이용한 문맥의존 철자오류 교정
Context-sensitive Spelling Error Correction using Eojeol N-gram

정보과학회논문지 = Journal of KIISE, v.41 no.12, 2014년, pp.1081 - 1089

김민호 (부산대학교 전자전기컴퓨터공학과) , 권혁철 (부산대학교 정보컴퓨터공학부) , 최성기 (부산대학교 전자전기컴퓨터공학과)

초록
AI-Helper

문맥의존 철자오류의 교정 방법은 크게 규칙을 이용한 방법과 통계 정보에 기반을 둔 방법으로 나뉘며, 이중 통계적 오류 교정 방법을 중심으로 연구가 진행되었다. 통계적 오류 방법은 문맥의존 철자오류 문제를 어의 중의성 해소 문제로 간주한 방법으로서, 교정 대상 어휘와 대치 후보 어휘로 이루어진 교정 어휘 쌍을 문맥에 따라 분류하는 방법이다. 본 논문에서는 본 연구진의 기존 연구 결과인 교정 어휘 쌍을 이용한 확률 모델의 성능 향상을 위해 어절 n-gram 모델을 기존 모델에 결합하는 방법을 제안한다. 본 논문에서 제안하는 결합 모델은 각 모델을 통해 계산된 문장의 확률을 보간(interpolation)하는 방법과 각각의 모델을 차례대로 적용하는 방법이다. 본 논문에서 제안한 두 가지 결합 모델 모두 기존 모델이나 어절 n-gram만 이용한 모델보다 높은 정확도와 재현율을 보인다.

Abstract ▼ AI-Helper

Context-sensitive spelling-error correction methods are largely classified into rule-based methods and statistical data-based methods, the latter of which is often preferred in research. Statistical error correction methods consider context-sensitive spelling error problems as word-sense disambiguation problems. The method divides a vocabulary pair, for correction, which consists of a correction target vocabulary and a replacement candidate vocabulary, according to the context. The present paper proposes a method that integrates a word-phrase n-gram model into a conventional model in order to improve the performance of the probability model by using a correction vocabulary pair, which was a result of a previous study performed by this research team. The integrated model suggested in this paper includes a method used to interpolate the probability of a sentence calculated through each model and a method used to apply the models, when both methods are sequentially applied. Both aforementioned types of integrated models exhibit relatively high accuracy and reproducibility when compared to conventional models or to a model that uses only an n-gram.

주제어

참고문헌 (10)

Andrew R. Golding, Dan Roth, "A Winnow-Based Approach to Context-Sensitive Spelling Correction," Machine Learning, Vol. 34, pp. 107-130, 1998.
A. Islam and D. Inkpen, "Semantic text similarity using corpus-based word similarity and string similarity," ACM Transactions on Knowledge Discovery from Data, Vol. 2, No. 2, pp. 1-25, 2008.
A. Islam and D. Inkpen, "Real-Word Spelling Correction using Google Web 1T 3-grams," Proc. of International Conference on Natural Language Processing and Knowledge Engineering, Vol. 3, pp.1241-1249, 2009.
W.-O. Amber, G. Hirst, and A. Budanitsky, "Realword spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model," Proc. of 9th International Conference on Intelligent Text Processing and Computational Linguistics, Vol. 4919, pp. 605-616, 2008.
G. Hirst and A. Budanitsky, "Correcting real-word spelling errors by restoring lexical cohesion," Natural Language Engineering, Vol. 11, No. 1, pp. 87-111, 2005.

상세보기
C. Choi, S. J. Park, C. J. Kim, Gyus, "Analysis of Uncorrected Typing Rate of keyboard to Design Ergonomic Keyboard Based on Qwerty Keyboard," Proc. of the Ergonomics Society of Korea Spring Conference, Vol. 1, pp. 142-145, 2000. (in Korean)
Y. A. Park and R. Levy, "Automated Whole Sentence Grammar Correction using a Noisy Channel Model," Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 49, No. 1, pp. 934-944, 2011.
O. Kolak and P. Resnik, "OCR Error Correction using a Noisy Channel Model," Proc. of the second international conference on Human Language Technology Research, Vol. 2, No. 1, pp. 257-262, 2002.
E. Brill and R. C. Moore, "An Improved Error Model for Noisy Channel Spelling Correction," Proc. of the 38th Annual Meeting on Association for Computational Linguistics, Vol. 38, No. 1, pp. 286-293, 2000.
M. D. Kernighan, K. W. Church, and W. A. Gale, "A Spelling Correction Program based on a Noisy Channel Model," Proc. of the 13th conference on Computational linguistics, Vol. 13, No. 1, pp. 205-210, 1990.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

어절 N-gram을 이용한 문맥의존 철자오류 교정
Context-sensitive Spelling Error Correction using Eojeol N-gram

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (10)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

어절 N-gram을 이용한 문맥의존 철자오류 교정 Context-sensitive Spelling Error Correction using Eojeol N-gram

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (10)

이 논문을 인용한 문헌

저자의 다른 논문 :

김민호 (10) 권혁철 (42)

관련 콘텐츠

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

어절 N-gram을 이용한 문맥의존 철자오류 교정
Context-sensitive Spelling Error Correction using Eojeol N-gram

초록
AI-Helper