[논문]최신 기계번역 사후 교정 연구

문현석; 박찬준; 어수경; 서재형; 임희석

doi:10.14400/jdc.2021.19.7.199

최신 기계번역 사후 교정 연구
Recent Automatic Post Editing Research 원문보기

디지털융복합연구 = Journal of digital convergence, v.19 no.7, 2021년, pp.199 - 208

문현석 (고려대학교 컴퓨터학과) , 박찬준 (고려대학교 컴퓨터학과) , 어수경 (고려대학교 컴퓨터학과) , 서재형 (고려대학교 컴퓨터학과) , 임희석 (고려대학교 컴퓨터학과)

초록
AI-Helper

기계번역 사후교정이란, 기계번역 문장에 포함된 오류를 자동으로 교정하기 위해 제안된 연구 분야이다. 이는 번역 시스템과 관계없이 번역문의 품질을 높이는 오류 교정 모델을 생성하는 목적을 가진 연구로, 훈련을 위해 소스문장, 번역문, 그리고 이를 사람이 직접 교정한 문장이 활용된다. 특히, 최신 기계번역 사후교정 연구에서는 사후교정 데이터를 통한 학습을 진행하기 이전에, 사전학습된 다국어 언어모델을 활용하는 방법이 적용되고 있다. 이에 본 논문은 최신 연구들에서 활용되고 있는 다국어 사전학습 언어모델들과 함께, 해당 모델을 도입한 각 연구에서의 구체적인 적용방법을 소개한다. 나아가 이를 기반으로, 번역 모델과 mBART모델을 활용하는 향후 연구 방향을 제안한다.

Abstract ▼ AI-Helper

Automatic Post Editing(APE) is the study that automatically correcting errors included in the machine translated sentences. The goal of APE task is to generate error correcting models that improve translation quality, regardless of the translation system. For training these models, source sentence, machine translation, and post edit, which is manually edited by human translator, are utilized. Especially in the recent APE research, multilingual pretrained language models are being adopted, prior to the training by APE data. This study deals with multilingual pretrained language models adopted to the latest APE researches, and the specific application method for each APE study. Furthermore, based on the current research trend, we propose future research directions utilizing translation model or mBART model.

주제어

표/그림 (3)

그림 Fig. 1. WMT19 Unbabel's APE Model[9]
그림 Fig. 2. WMT20 HW-TSC APE Model[8]
표 Table 1. WMT20 APE results table[3]

참고문헌 (33)

Park, C., & Lim, H. (2020). Automatic Post Editing Research. Journal of the Korea Convergence Society, 11(5), 1-8.

원문보기 상세보기
Park, C., Yang, Y., Park, K., & Lim, H. (2020). Decoding strategies for improving low-resource machine translation. Electronics, 9(10), 1562.

상세보기
Chatterjee, R., Freitag, M., Negri, M., & Turchi, M. (2020, November). Findings of the WMT 2020 shared task on automatic post-editing. In Proceedings of the Fifth Conference on Machine Translation, (pp. 646-659).
Koehn, P., Chaudhary, V., El-Kishky, A., Goyal, N., Chen, P. J., & Guzman, F. (2020, November). Findings of the WMT 2020 shared task on parallel corpus filtering and alignment. In Proceedings of the Fifth Conference on Machine Translation, (pp. 726-742).
Specia, L. et al. (2020, November). Findings of the WMT 2020 shared task on quality estimation. In Proceedings of the Fifth Conference on Machine Translation (pp. 743-764).
Pal, S., Herbig, N., Kruger, A., & van Genabith, J. (2018, October). A transformer-based multi-source automatic post-editing system. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers (pp. 827-835).
Ive, J., Specia, L., Szoc, S., Vanallemeersch, T., Van den Bogaert, J., Farah, E., ... & Khalilov, M. (2020, May). A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 3692-3697).
Yang, H., Wang, M., Wei, D., Shang, H., Guo, J., Li, Z., ... & Chen, Y. (2020, November). HW-TSC's Participation at WMT 2020 Automatic Post Editing Shared Task. In Proceedings of the Fifth Conference on Machine Translation (pp. 797-802).
Lopes, A. V., Farajian, M. A., Correia, G. M., Trenous, J., & Martins, A. F. (2019). Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing. arXiv preprint arXiv:1905.13068.
Lee, J., Lee, W., Shin, J., Jung, B., Kim, Y. G., & Lee, J. H. (2020, November). POSTECH-ETRI's Submission to the WMT2020 APE Shared Task: Automatic Post-Editing with Cross-lingual Language Model. In Proceedings of the Fifth Conference on Machine Translation (pp. 777-782).
Lee, D. (2020, November). Cross-Lingual Transformers for Neural Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation (pp. 772-776).
Allen, J., & Hogan, C. (2000, April). Toward the development of a post editing module for raw machine translation output: A controlled language perspective. In Third International Controlled Language Applications Workshop (CLAW-00) (pp. 62-71).
Simard, M., Goutte, C., & Isabelle, P. (2007, April). Statistical phrase-based post-editing. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference (pp. 508-515).
Shin, J., & Lee, J. H. (2018, October). Multi-encoder transformer network for automatic post-editing. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers (pp. 840-845).
Junczys-Dowmunt, M., & Grundkiewicz, R. (2018). MS-UEdin submission to the WMT2018 APE shared task: Dual-source transformer for automatic post-editing. arXiv preprint arXiv:1809.00188.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Pires, T., Schlinger, E., & Garrette, D. (2019). How multilingual is multilingual bert?. arXiv preprint arXiv:1906.01502.
Lample, G., & Conneau, A. (2019). Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291.
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzman, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.
Wenzek, G., Lachaux, M. A., Conneau, A., Chaudhary, V., Guzman, F., Joulin, A., & Grave, E. (2019). Ccnet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359.
Conneau, A., Lample, G., Rinott, R., Williams, A., Bowman, S. R., Schwenk, H., & Stoyanov, V. (2018). XNLI: Evaluating cross-lingual sentence representations. arXiv preprint arXiv:1809.05053.
Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., ... & Zettlemoyer, L. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726-742.

상세보기
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2019). Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10), 1345-1359.

상세보기
Correia, G. M., & Martins, A. F. (2019). A simple and effective approach to automatic post-editing with transfer learning. arXiv preprint arXiv:1906.06253.
Jihyung L, WonKee L, Young-Gil K, Jonghyeok L. (2020). Transfer Learning of Automatic Post-Editing with Cross-lingual Language Model. KIISE 2020, 392-394.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., ... & Gelly, S. (2019, May). Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning (pp. 2790-2799). PMLR.
Kim, H., Lee, J. H., & Na, S. H. (2017, September). Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation. In Proceedings of the Second Conference on Machine Translation (pp. 562-568).
Park, C., Yang, Y., Lee, C., & Lim, H. (2020). Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection. IEEE Access, 8, 106264-106272.

상세보기
Park, C., Kim, K., Yang, Y., Kang, M., & Lim, H. (2020). Neural spelling correction: translating incorrect sentences to correct sentences for multimedia. Multimedia Tools and Applications, 1-18.
Wang, J., Wang, K., Fan, K., Zhang, Y., Lu, J., Ge, X., ... & Zhao, Y. (2020, November). Alibaba's Submission for the WMT 2020 APE Shared Task: Improving Automatic Post-Editing with Pre-trained Conditional Cross-Lingual BERT. In Proceedings of the Fifth Conference on Machine Translation (pp. 789-796).
Lee, W., Shin, J., Jung, B., Lee, J., & Lee, J. H. (2020, November). Noising Scheme for Data Augmentation in Automatic Post-Editing. In Proceedings of the Fifth Conference on Machine Translation (pp. 783-788).

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

최신 기계번역 사후 교정 연구
Recent Automatic Post Editing Research 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (3)

표/그림 (3)

참고문헌 (33)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

최신 기계번역 사후 교정 연구 Recent Automatic Post Editing Research 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (3)

표/그림 (3)

참고문헌 (33)

이 논문을 인용한 문헌

저자의 다른 논문 :

임희석 (82)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

최신 기계번역 사후 교정 연구
Recent Automatic Post Editing Research 원문보기

초록
AI-Helper