[논문]한국어 TTS(Text-to-speech)를 위한 발음 및 운율 생성

김병창

한국어 TTS(Text-to-speech)를 위한 발음 및 운율 생성
Prosody/phoneme generation for Korean TTS(Text-to-speech) 원문보기

김병창 (Pohang Univ. of Science and Technology 전자.컴퓨터공학부 컴퓨터공학 자연언어처리전공 국내박사)

초록 ▼
AI-Helper

본 논문에서는 K-ToBI(Korean Tone and Break Indexes)를 사용하는, 높은 성능을 가지는 한국어 TTS(Text-to-Speech) 시스템을 개발하기 위해 해결해야 하는 두가지 문제, 즉, 발음변환과 운율생성에 관한 해결방법을 제안하고 그 성능을 보인다. 발음변환과 운율생성은 깊은 관계가 있으며, 따라서 통합된 시스템을 논하기 위해서는 함께 다루어야 하는 문제이다. K-ToBI는 음성에 대해 언어적 지식을 기술하기 위한 다층의 표현방법이다. 발음변환에 대해 사전기반 방법과 규칙기반 방법은 나름대로의 장점과 단점을 가진다. 이 논문에서는 한국어의 발음변환을 위해 발음패턴사전과 CGV(자음 자음 모음)자소-발음 변환규칙을 사용하는 하이브리드 방법을 제안한다. 발음패턴사전은 사전기반방법의 변형이며, 형태소의 패턴과 그것의 발음패턴으로 이루어진다. 그 패턴은 형태소의 왼쪽과 오른쪽 끝의 발음에 대한 후보발음들을 나타낸다. CCV 자소-발음 변환규칙은 규칙기반 방법을 의미하며, 형태소내의 발음변환을 담당한다. 입력된 문장의 형태소분석 결과에 대해, 각 형태소들은 발음패턴사전을 이용해 여러개의 후보 발음 패턴으로 변환된다. 이 후보발음 패턴내의 자소들은 CCV형태로 묶여지고,CCV자소-발음 변환규칙에 의해 각 발음으로 변환된다. 마지막으로 형태-발음 연결테이블에 의해 인접한 형태소들의 발음의 연결 가능성을 검사하게 된다. 운율생성에 대해 본 논문은 K-ToBI에 기반한 피치와 휴지생성 방법을 제안한다. ToBI를 운율의 중간표현단계로 사용하면, 직접적인 운율생성방법에 비해, 높은 유연성과 도메인 이식성이 뛰어나다고 알려져 있다. 하지만 실용적인 성능을 위해서는 corpus를 준비하기 위해 많은 비용이 요구된다. 본 논문에서는 자동화된 K-ToBI 레이블링 방법을 소개하고, lexicosyntactic 특징을 decision tree에 이용한 운율생성 방법을 제안한다. 실험결과, 자동으로 레이블링된 corpus로부터 얻어진 운율의 성능이, 직접적인 방법으로 운율을 생성하는 시스템중 현재 최상의 성능을 가지는 시스템의 성능에 필적하는 것을 확인하였다.

Abstract ▼ AI-Helper

Our efforts on developing high performance Korean TTS (Text-to-Speech) system with K-ToBI (Korean Tone and Break Index) are mainly focused on two important sub problems of Korean TTS, i.e., graphemeto-phoneme conversion and prosody (especially, phrase break and pitch) generation. The sub problems (grapheme-to-phoenem conversion and prosody generation) are closely interrelated and therefore should be treated together for better integration. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. As for the grapheme-to-phoneme conversion problem, both dictionary-based and rule-based methods have had their own advantages and limitations. For example, a large sized phonetic dictionary and complex morphophonemic rules are required for the dictionary-based method, whereas the LTS (letter to sound) rule-based method itself cannot model the complete morphophonemic constraints. This thesis describes a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant consonant vowel) LTS (letter to sound) rules. The phonetic pattern dictionary, standing for the dictionary-based method, contains entries in the form of a morpheme pattern and its phonetic pattern. The patterns represent candidate phonological changes in left and right boundaries of morphemes. Obviously, the CCV LTS rules stand for the rule-based method. The rules are in charge of graphemeto-phoneme conversion within morphemes. The conversion method consists of mainly two steps including graphemeto-phoneme conversion and morphophonemic connectivity check, and two preprocessing steps including phrase break prediction and morpheme normalization. Morpheme normalization is to replace non-Korean symbols with their corresponding standard Korean graphemes. In the morpheme phoneticizing module, each morpheme in the phrase is converted into phonetic patterns by looking it up in the phonetic pattern dictionary. Graphemes within a morpheme are grouped into CCV units and converted into phonemes by the CCV LTS rules. The morphophonemic connectivity table supports grammaticality checking of the two adjacent phonetic morphemes. As for the prosody generation problem, we present a pitch and phrase-break generation architecture based on K-ToBI (Korean Tone and Break Index) representation. The TTS (Text-To-Speech) system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires large amount of data for statistical prosody modeling. Contrary to previous ToBI-based systems, this thesis proposes a new method which transcribes the K-ToBI labels completely automatically in Korean speech. We develop automatic corpus-based K-ToBI labeling tools and prediction methods based on several lexico-syntactic linguistic features for decision-tree induction. We demonstrated the performance of F0 generation from automatically predicted K-ToBI labels, and confirmed that the performance is reasonably comparable with state-of-the-art direct prosody generation methods and previous ToBI-based methods.

주제어

학위논문 정보

저자	김병창
학위수여기관	Pohang Univ. of Science and Technology
학위구분	국내박사
학과	전자.컴퓨터공학부 컴퓨터공학 자연언어처리전공
발행연도	2002
총페이지	72 leaves
키워드	PROSODY PHONEME GENERATION KOREAN 한국어 TTS TEXTTOSPEECH 발음 운율 자연언어 한국어처리 자연언어처리
언어	eng
원문 URL	http://www.riss.kr/link?id=T11944811&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

한국어 TTS(Text-to-speech)를 위한 발음 및 운율 생성
Prosody/phoneme generation for Korean TTS(Text-to-speech) 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

한국어 TTS(Text-to-speech)를 위한 발음 및 운율 생성 Prosody/phoneme generation for Korean TTS(Text-to-speech) 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

한국어 TTS(Text-to-speech)를 위한 발음 및 운율 생성
Prosody/phoneme generation for Korean TTS(Text-to-speech) 원문보기

초록 ▼
AI-Helper