[보고서]지식기반의 음성인식을 위한 통합 시스템 연구

최정윤

지식기반의 음성인식을 위한 통합 시스템 연구
A consolidated knowledge-based speech recognition system 원문보기

보고서 정보
주관연구기관	연세대학교 Yonsei University
연구책임자	최정윤
보고서유형	최종보고서
발행국가	대한민국
언어	한국어
발행년월	2012-03
과제시작연도	2011
주관부처	교육과학기술부
사업 관리 기관	한국연구재단
등록번호	TRKO201300019098
과제고유번호	1345154206
사업명	신진연구지원사업
DB 구축일자	2013-08-26
키워드	음성 인식.지식기반 음성인식.랜드마크 검출.음성-음향학.변별적 특징.음성 인지 이론.speech recognition.knowledge-based speech recognition.landmark detection.acoustic-phonetics.distinctive features.human perception theory.

초록 ▼

연구의 목적 및 내용
본 연구는 현재 통용되는 확률기반의 음성인식 방식의 한계를 극복하기 위하여 지식기반의 음성인식기를 구축하는 것을 목표로 하고 있다. 제안하고자 하는 음성인식 방법은 기존의 방법과 달리 인간이 음성을 인지하는 과정을 따르고 있다. 음성-음향학적인 정보와 언어학 정보를 이용하고, 신호로부터 필요한 정보를 추출하기 위하여 음성신호처리 기술들을 적용한다. 지식기반의 음성인식 시스템은 크게 랜드마크 검출, 음성의 변별적 특징 추출, 변별적 특징을 이용한 음소 및 단어단위의 결합, 그리고 각 언어에 적용되는 발음법칙을 적용하는 단계로 구성된다. 이러한 이론에 근거한 연구는 부분적으로 진행되었으나, 전체 시스템에 대한 연구는 진행된 적이 없다. 따라서 이번 연구에서는 기존의 연구결과를 통합하고, 각 과정의 성능을 향상시켜 전체 음성인식기를 구축하고자 한다.
연구결과
지식기반의 음성인식 시스템은 사람이 음성을 인지하는 과정에 대한 이론들을 기반으로 하고 있다. 따라서 음성 인식기는 세분화된 음성의 특징을 얻어내고, 그 부분으로부터 음성의 정보를 획득한다. 전체 시스템은 음성 랜드마크 검출, 변별적 특징 추출, 음소 추정, 단어단위 결합, 그리고 피드백 및 발음 법칙 적용으로 구성된다. 이 방식은 기본적으로 음성에서 주요하게 나타나는 음성 랜드마크를 검출하고, 그 부분에 대해서 변별적 특징을 추출하는 과정으로 진행된다. 현재까지 국내외적으로 음성인식의 최종단계까지 연구된 바는 없는 상태이다.
3차년도까지 연구를 진행하며, 독립적으로 구성 되어있었던 랜드마크 검출 모듈을 통합하여 추출된 음성의 랜드마크로부터 변졀적 특징을 검출하고, 대부분의 음소를 다 나타낼 수 있을 정도로 연구를 진행 하였다. 또한 기존의 독립된 모듈 사이에서 발생 할 수 있는 불필요한 연산과 결정과정의 차이도 통합된 시스템으로 극복 하였고, 모음을 이용하여 간단한 피드백 시스템 까지 구현하여 음소별 그리고 단어 단위별 성능을 평가 하였다. 최종단계 까지 구현해 본 결과 성능은 현재의 음성인식(HMM)에 미치지는 못하였지만 추후 개별 모듈과 피드백 시스템에서의 다양한 접근이 가능함을 고려하여 볼 때, 앞으로의 연구가 더욱 기대된다고 할 수 있다.
연구결과의 활용계획
음성인식은 큰 틀에서 보았을 때 유저 인터페이스의 일부로 활용성이 많은 부분이기 때문에 홈 오토메틱스 연구, 자동차 음성 컨트롤 등에서 이미 많은 수요가 있고, 로보틱스 산업에서도 그 수요가 증가하는 추세이다. 또한 목소리를 통한 자동화는 일반 편의 시설 및 미래화 산업에서 필수요소일 뿐만 아니라 장애인을 위한 복지차원에서도 필요한 기술 이라고 할 수 있다.
연구를 진행하는 가운데 산출된 음소들의 음향학적인 특징들은 기존의 음성인식 시스템에도 적용이 가능하겠지만, 음성인식에 관한 연구 뿐 아니라 다양한 음성/음향학 관련 연구에 적용이 가능할 것으로 기대하고 있다. 또한 전체 시스템을 구축하는 가운데 미흡했던 부분들은 앞으로의 연구를 통해서 보완해 나가야 하며, 이를 토대로 음성인식 시장에 새로운 이론을 제공할 것으로 기대하고 있다.

Abstract ▼

Purpose&contents
This research aims to construct a knowledge-based speech recognition system to overcome limitation of statistical speech recognition methods. Contrary to existing methods, the proposing method follows the speech perception process of humans. Basically, this method uses on speech signal processing to gather necessary information as well as utilizing phonetic and linguistic information. The process of this knowledge-based speech recognition system consists of a) detecting landmark data, b) finding distinctive features c) consolidation of each phoneme from distinctive features and d) utilizing pronunciation rule which is applied for each language. There exists no prior research on an integrated system using this methodology, and a working system has never been developed. Hence, the main purpose of this research is to combine and integrate existing research and develop a speech recognition system for a complete system.
Result
A knowledge-based speech recognition system is based on theories on human speech perception. Therefore, the recognizer extracts detailed speech features, and finds information on speech from those features. The overall system consists of landmark detection, distinctive feature extraction, phoneme estimation, lexical access, and feedback and phonotactic/phonological rule application. The research focuses on landmark detection, and proceeds to distinctive feature extraction. At present, a complete knowledge-based system for speech recognition is not available, either domestically or overseas.
From continued research until year 3, the individual landmark detection modules are consolidated into a single system, and distinctive features are able to be extracted, so that the majority of phonemes can be represented. Also, unnecessary computations among the different landmark modules are eliminated in the consolidated system. In addition, a simple prototype feedback algorithm involving vowels has been implemented, and performance is presented at the phonemic and word levels.
From final results, performance is still below that of presently widely used HMM-based speech recognition systems. However, continued application of feedback among the individual landmark detection modules, as well as various feedback schemes over multiple levels show the possibility of continued improvement in performance, and further research is expected to show performance that more closely follows human speech perception patterns.
Expected Contribution
From a wider viewpoint, speech recognition has numerous applications as mode of user interface, so that much demand is expected in various fields such as home automatics, automobile control, etc., and demand in the robotics industry is also increasing.
Also, automation using speech is necessary for convenience and in future industries, but is also needed for caregiving related to persons with disabilities.
The acoustic features derived from the research can be applied to conventional speech recognition systems, but more widely, the features are expected to be applicable to non-speech recognition applications in the speech/acoustics area. Also, by completing the research in constructing the overall knowledge-based speech recognition system, it is also expected that the research will provide additional methods for speech recognition research.

목차 Contents

일반연구자지원사업 최종보고서 양식 ... 1
목차 ... 3
I. 연구계획 요약문 ... 4
1. 국문요약문 ... 4
II. 연구결과 요약문 ... 5
1. 국문요약문 ... 5
2. 영문요약문 ... 6
III. 연구내용 및 결과 ... 7
1. 연구개발과제의 개요 ... 7
2. 국내외 기술개발 현황 ... 7
3. 연구수행 내용 및 결과 ... 8
4. 목표달성도 및 관련분야에의 기여도 ... 37
5. 연구결과의 활용계획 ... 38
6. 연구과정에서 수집한 해외과학기술정보 ... 38
7. 주관연구책임자 대표적 연구 실적 ... 38
8. 참고문헌 ... 39
9. 연구성과 ... 40
10. 기타사항 ... 40

표/그림 (30)

표 변별적 특징과 조음기관의 관계
표 전체 시스템 흐름
표 제안된 랜드마크 검출알고리즘의 흐름도
표 모음 랜드마크 검출을 위한 밴드에너지와 랜드마크 검출 결과
표 640~2800Hz의 대역에너지와 반모음의 위치 예
표 유성음 확률과 비음 확률 계산을 위한 특징벡터와 클래스
표 랜드마크 검출 결과
표 랜드마크 종류에 따른 성능평가
표 Gaussian mixture model의 예
표 Support vector machine 의 예시
표 각 모음에 해당하는 변별적 특징들
표 각 모음 위치 별 성능
표 모음 긴장성과 모음 위치 그리고 최종 모음 구분의 성능
표 반모음 분류 실험 결과
표 자음 종류 및 자음 종류에 따른 유성/무성 자음 분류
표 자음의 구조 및 성대 떨림 유무에 따른 분류 모듈
표 각 특징에 대한 F-ratio(굵은 글씨: 분류에 사용된 특징
표 stop consonant/fricative 분류 모듈 성능
표 affricate의 각 특징에 대한 F-ratio(굵은 글씨: 분류에 사용된 특징
표 stop 및 fricative의 분류
표 stop place의 place별 분류 성능
표 fricative place의 place별 분류 성능
표 nasal place 의 분류 성능
표 음소 매칭 시스템의 모식도와 필요한 정보
표 Landmark와 TIMIT label을 기반으로 한 실험과 HMM을 이용한 실험의 음소 단위 인식 결과
표 Landmark와 TIMIT label을 기반으로 한 실험의 단어 단위 인식 결과
표 지식기반 음성인식 흐름도
표 모음 피드백 시스템 모식도
표 랜드마크 기반 모음 피드백 시스템의 단어 단위 인식 결과
표 TIMIT label 기반 모음 피드백 시스템의 단어 단위 인식 결과

과제명(ProjectTitle) :	-
연구책임자(Manager) :	-
과제기간(DetailSeriesProject) :	-
총연구비 (DetailSeriesProject) :	-
키워드(keyword) :	-
과제수행기간(LeadAgency) :	-
연구목표(Goal) :	-
연구내용(Abstract) :	-
기대효과(Effect) :	-

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 제목(한글), 저자명(한글), 발행일자, 전자원문, 초록(한글), 초록(영문) 관리번호, 제목(한글), 제목(영문), 저자명(한글), 저자명(영문), 주관연구기관(한글), 주관연구기관(영문), 발행일자, 총페이지수, 주관부처명, 과제시작일, 보고서번호, 과제종료일, 주제분류, 키워드(한글), 전자원문, 키워드(영문), 입수제어번호, 초록(한글), 초록(영문), 목차
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

지식기반의 음성인식을 위한 통합 시스템 연구
A consolidated knowledge-based speech recognition system 원문보기