[논문]말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결

박한철; 김정호; 박종철

doi:10.5626/jok.2017.44.2.163

초록
AI-Helper

통계적 기계 번역을 이용한 구어-수화 번역 연구가 활발해짐에도 불구하고 수화 말뭉치의 자원 희소성 문제는 해결되지 않고 있다. 본 연구는 수화 번역의 첫 번째 단계로써 통계적 기계 번역을 이용한 구어-수지 신호 번역에서 말뭉치 자원 희소성으로부터 기인하는 문제점들을 해결할 수 있는 세 가지 전처리 방법을 제안한다. 본 연구에서 제안하는 방법은 1) 구어 문장의 패러프레이징을 통한 말뭉치 확장 방법, 2) 구어 단어의 표제어화를 통한 개별 어휘 출현 빈도 증가 및 구어 표현의 번역 가능성을 향상시키는 방법, 그리고 3) 수지 표현으로 전사되지 않는 구어의 기능어 제거를 통한 구어-수지 표현 간 문장 성분을 일치시키는 방법이다. 서로 다른 특징을 지닌 영어-미국 수화 병렬 말뭉치들을 이용한 실험에서 각 방법론들이 단독으로 쓰일 때와 조합되어 함께 사용되었을 때 모두 말뭉치의 종류와 관계없이 번역 성능을 개선시킬 수 있다는 것을 확인할 수 있었다.

Abstract ▼ AI-Helper

Despite the rise of studies in spoken to sign language translation, low-resource problems of sign language corpus have been rarely addressed. As a first step towards translating from spoken to sign language, we addressed the problems arising from resource scarcity when translating spoken language to...

Despite the rise of studies in spoken to sign language translation, low-resource problems of sign language corpus have been rarely addressed. As a first step towards translating from spoken to sign language, we addressed the problems arising from resource scarcity when translating spoken language to manual signals translation using statistical machine translation techniques. More specifically, we proposed three preprocessing methods: 1) paraphrase generation, which increases the size of the corpora, 2) lemmatization, which increases the frequency of each word in the corpora and the translatability of new input words in spoken language, and 3) elimination of function words that are not glossed into manual signals, which match the corresponding constituents of the bilingual sentence pairs. In our experiments, we used different types of English-American sign language parallel corpora. The experimental results showed that the system with each method and the combination of the methods improved the quality of manual signals translation, regardless of the type of the corpora.

주제어

참고문헌 (14)

C. Valli, C. Lucas, and K.J. Mulrooney, Linguistics of American Sign Language: An Introduction, 4th Ed., Gallaudet University Press, Washing, D.C., 2005.
S. Ebling and M. Huenerfauth, "Bridging Gap between Sign Language Machine Translation and Sign Language Animation Using Sequence Classification," Proc. of SLPAT, pp. 2-9, 2015.
P. Koehn, A. Axelrod, A.B. Mayne, C. Callison-Burch, M. Osborne, and D. Talbot, "Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation," Proc. of IWSLT, pp. 68-75, 2005.
C. Callison-Burch, P. Koehn, and M. Osborne, "Improved Statistical Machine Translation Using Paraphrases," Proc. of HLT-NAACL, pp. 17-24, 2006.
R.M. Seraj, M. Siahbani, and A. Sarkar, "Improving statistical Machine Translation with a Multilingual Paraphrase Database," Proc. of EMNLP, pp. 1379-1390, 2015.
E. Pavlick, P. Rastogi, J. Ganitkevitch, B.V. Durme, and C. Callison-Burch, "PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification," Proc. of ACL-IJCNLP, pp. 425-430, 2015.
G. Hong, S.-W. Lee, and H.-C. Rim, "Bridging Morpho-Syntactic Gap between Source and Target Sentences for English-Korean Machine Translations," Proc. of ACL-IJCNLP, pp. 233-236, 2009.
D. Stein, C. Schmidt, and H. Ney, "Analysis, Preparation, and Optimization of Statistical Sign Language Machine Translation," Machine Translation, Vol. 26, No. 4, pp. 325-357, Dec. 2012.

상세보기
S. Valerie, "SignBank 2002-The SignWriting Online Dictionary," Center for Sutton Movement Writing, 2002.
C. Neidle and C. Vogler, "A New Web Interface to Facilitate Access to Corpora: Development of the ASLLRP Data Access Interface (DAI)," Proc. of 5th Workshop on the Representation and Processing of Sign Languages: Interactions between Corpus and Lexicon, 2012.
C.D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S.J. Bethard, and D. McClosky, "The Stanford CoreNLP Natural Language Processing Toolkit," Proc. of ACL, pp. 55-60, 2014.
P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. herbst, "Moses: Open Source Toolkit for Statistical Machine Translation," Proc. of ACL, pp. 177-180, 2007.
C. Callison-Burch, C. Fordyce, P. Koehn, C. Monz, and J. Schroeder, "(Meta-) Evaluation of Machine Translation," Proc. of StatMT, pp. 136-158, 2007.
G. Doddington, "Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics," Proc. of HLT, pp. 138-145, 2002.

이 논문을 인용한 문헌

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결
Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결 Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

박한철 (2) 김정호 (1) 박종철 (13)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

말뭉치 자원 희소성에 따른 통계적 수지 신호 번역 문제의 해결
Addressing Low-Resource Problems in Statistical Machine Translation of Manual Signals in Sign Language

초록
AI-Helper