[논문]음성 감정인식에서의 톤 정보의 중요성 연구

이정인; 강홍구

doi:10.5909/jbe.2013.18.5.713

음성 감정인식에서의 톤 정보의 중요성 연구
On the Importance of Tonal Features for Speech Emotion Recognition 원문보기

방송공학회논문지 = Journal of broadcast engineering, v.18 no.5, 2013년, pp.713 - 721

이정인 (연세대학교 전기전자공학과) , 강홍구 (연세대학교 전기전자공학과)

초록
AI-Helper

본 연구는 음성의 감정인식에 있어서 크로마 피쳐를 기반으로 한 음성 토널 특성에 대하여 기술하였다. 토널 정보가 갖는 장조와 단조와 같은 정보가 음악의 분위기에 미치는 영향과 유사하게 음성의 감정을 인지하는 데에도 토널 정보의 영향이 존재한다. 감정과 토널 정보의 관계를 분석하기 위해서, 본 연구에서는 크로마 피쳐로부터 재합성된 신호를 이용하여 청각 실험을 수행하였고, 인지실험결과 긍정과 부정적 감정에 대한 구분이 가능한 것으로 확인되었다. 인지 실험을 바탕으로 음성에 적합한 토널 피쳐를 적용하여 감정인식 실험을 진행하였고, 토널 피쳐를 사용하였을 경우 감정인식 성능이 향상되는 것을 확인 할 수 있다.

Abstract ▼ AI-Helper

This paper describes an efficiency of chroma based tonal features for speech emotion recognition. As the tonality caused by major or minor keys affects to the perception of musical mood, so the speech tonality affects the perception of the emotional states of spoken utterances. In order to justify this assertion with respect to tonality and emotion, subjective hearing tests are carried out by using synthesized signals generated from chroma features, and consequently show that the tonality contributes especially to the perception of the negative emotion such as anger and sad. In automatic emotion recognition tests, the modified chroma-based tonal features are shown to produce noticeable improvement of accuracy when they are supplemented to the conventional log-frequency power coefficient (LFPC)-based spectral features.

주제어

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

문제 정의

In this study, we focus on the tonal feature that has drawn less attention than other types of features in speech emotion recognition. While listening music, human perceives the mood by the dominant music key; whether the dominant key of the music is major or minor.
This paper described the efficiency of tonal features in speech emotion. We were motivated by the concept of ton- ality used for perceiving mood or emotion in music applications.

제안 방법

A chroma feature is one of the widely used features for representing tonality [12], but it needs to be modified for speech signal applications, because the intensity of speech signal is mainly concentrated in low frequency region. By analyzing the feature characteristics of all the octave frequency range, an appropriate frequency range for the chroma feature is determined in this study. Interestingly, the effective frequency range is highly related to the fundamental frequency of human voice.
Experiment 1 compares the modified chroma features and conventional features. Experiment 2 shows the recog- nition accuracy of the revised system that the proposed to- nal features are combined with LFPC features.
Recently, many studies in emotion recognition have a tendency to focus on increasing the type and size of features and finding out optimal features combinations. The proposed method is meaningful in the sense that human perception is involved in the feature set selection process and relationship between tonality and emotion is analyzed. From the analysis using mutual information, tonal features related to F0 frequency range contain reliable information about emotional states.

대상 데이터

In order to overcome the drawback of tonal and F0 related feature, combination of tonal and F0 related features are applied with LFPC. Features used in combined features are 164 features which consist of LFPC (120), CHR4 (24), and F0 (20). Experimental results are tabulated in Table 4.
The speech material used in this study is the Korean Emotion Corpus developed by Kang et al [16]. It includes four emotional states, which are joy, sad, angry, and neutral, recorded by 15 actresses and 15 actors. Forty five scripts for each emotion were recorded 3 times with the sampling rate of 16kHz.
Signals belonging to the neutral emotion class are used as references, and test materials are blinded and trials are randomly selected. Ten listeners participated in the test composed of 30 sentences.
The optimal cost parameter is obtained by using a 10-fold cross-validation technique with a parameter range from 2 to 1024. The training data set contains 1000 utterances for each emotion, and the utterances are not duplicated with test materials.

이론/모형

However, it is difficult to compute the distributions such as p(x) and p(xlc) from the collected data, the MI is calculated by utilizing the Gaussian mixture model (GMM)[15]. Estimated p(x) using GMM components are written as:
To implement a multi-class emotion recognition system, livsvm tools [17] are used for training and test. The optimal cost parameter is obtained by using a 10-fold cross-validation technique with a parameter range from 2 to 1024. The training data set contains 1000 utterances for each emotion, and the utterances are not duplicated with test materials.
The speech material used in this study is the Korean Emotion Corpus developed by Kang et al [16]. It includes four emotional states, which are joy, sad, angry, and neutral, recorded by 15 actresses and 15 actors.

성능/효과

Combined features provide improved performance both male and female speakers. Relative improvements of fe- male and male speaker are approximately 1% and 3% com- pared to the best performance of experiment 2. Because the feature dimensions used in experiment 2 and 3 are differ- ent, it does not seem to be a fair comparison. However, increased feature dimension does not always improve the performance as shown in Fig.
However, increased feature dimension does not always improve the performance as shown in Fig. 3. According to the recog- nition results, tonal features and F0 related features provide reliable cues of emotional states in glottal source of speech.
Interestingly, the effective frequency range is highly related to the fundamental frequency of human voice. Experimental results show that the combination of the proposed chroma feature with LFPC increases the recognition rates compared to the combination of LFPC+F0 or LFPC only. Especially, the proposed feature combination significantly improves overall accuracy for the gender dependent experiment.

참고문헌 (18)

R. Cowie, E. Douglas-Cowei, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, "Emotion Recognition in Human Computer Interaction," IEEE Signal Processing Magazine, pp. 32-80, 2001.
D. Ververidis and C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods," Speech Communication, vol. 48(9), pp. 1162-1181, 2006.

상세보기
M. E. Ayadi, M. S. Kamel, and F. Karray, "Survey on speech emotion recognition: Features, classification schemes, and databases", Pattern Recognition, vol. 44, pp. 572-587, 2011.

상세보기
I. Murray, J. Arnott, "Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion," J. Acoust. Soc. Am, vol. 93 (2), pp. 1097-1108, 1993.

상세보기
C. E. Williams and K. N. Stevens, "Emotion and speech: Some acoustical correlates", J. Acoust. Soc. Am, vol. 52(4), pp. 1238-1250, 1972.

상세보기
C. Gobl and A. N. Chasaide, "The role of voice quality in communicating emotion, mood and attitude", Speech Communication, vol. 40, pp. 189-212, 2003.

상세보기
M. Goudbeek and K. Scherer, "Beyond arousal: Valence and potency/ control cues in the vocal expression of emotion", J. Acoust. Soc. Am, vol. 128, pp. 1322-1336, 2010.

상세보기
S. Yacoub, S. Simske, X. Lin, J. Burns, "Recognition of Emotionsin Interactive Voice Response System," Proceedings of the Eurospeech 2003, Geneva, 2003.
T. L. Nwe, S. W. Foo, and et al, "Speech emotion recognition using hidden markov models", Speech Communication, vol. 41(4), pp. 603-623, 2003.

상세보기
M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representation," IEEE Transactions on Multimedia, vol. 7(1), pp. 96-104, 2005.

상세보기
Y. E. Kim, E. M. Schmidt, R. Migneco, B. G. Morton, P. Richardson, J. Scott, J. A. Speck, and D. Turnbull, "Music emotion recognition: A state of the art review", Proc. 11th Int. Soc. Music Information Retrieval Conf.(ISMIR), pp. 255-266, 2010.
M. Muller, F. Kurth, and M. Clausen, "Audio matching via chroma- based statistical features", Proc. 5th Int. Soc. Music Information Retrieval Conf.(ISMIR), pp. 288-295, 2005.
T. F. Quatieri, Discrete-Time Speech Signal Processing, Prentice-Hall, NJ, 2002.
H. Purwins, "Profiles of Pitch Classes: Circularity of Relative Pitch and Key: Experiments, Models, Computational Music Analysis, and Perspectives," Ph. D. dissertation, Berlin Univ. of Technol., Berlin, Germany, 2005.
T. Lan, D. Erdogmus, U. Ozertem, and Y. Huang, "Estimating mutual information using Gaussian mixture model for feature ranking and selection", Proc. Int. Joint Conf. on Neural Networks, pp. 5034-5039, 2006.
B.-S. Kang, "Text-independent emotion recognition algorithm using speech signal," M. S. Thesis, Yonsei university, Electrical and Electronic Engineering Department, 2000.
C.-C. Chang and C.-J. Lin, "Libsvm: a library for support vector machines", ACM Transactions on Intelligent Systems and Technology, vol. 2, 27:1-26, 2011.
P. Shen, Z. Changjun, X. CHen, "Automatic Speech Emotion Recognition using Support Vector Machine," Int. Conf. on Electronic and Mechanical Engineering and Information Technology (EMEIT), vol. 2, pp. 621-625, 2011.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

음성 감정인식에서의 톤 정보의 중요성 연구
On the Importance of Tonal Features for Speech Emotion Recognition 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

AI 본문요약
AI-Helper

문제 정의

제안 방법

대상 데이터

이론/모형

성능/효과

참고문헌 (18)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

음성 감정인식에서의 톤 정보의 중요성 연구 On the Importance of Tonal Features for Speech Emotion Recognition 원문보기

초록 AI-Helper

Abstract ▼ AI-Helper

주제어

AI 본문요약 엑셀 다운로드 AI-Helper

문제 정의

제안 방법

대상 데이터

이론/모형

성능/효과

참고문헌 (18)

이 논문을 인용한 문헌

저자의 다른 논문 :

이정인 (1) 강홍구 (29)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

음성 감정인식에서의 톤 정보의 중요성 연구
On the Importance of Tonal Features for Speech Emotion Recognition 원문보기

초록
AI-Helper

AI 본문요약
AI-Helper