본 연구는 감정상태와 음색특성의 관계를 확인하고, 추가로 cepstral 피쳐와 조합하여 감정인식을 진행하였다. Open quotient, harmonic-to-noise ratio, spectral tilt, spectral sharpness를 포함하는 특징들을 음색검출을 위해 적용하였고, 일반적으로 사용되는 피치와 에너지를 기반한 운율피쳐를 적용하였다. ANOVA분석을 통해 각 특징벡터의 유효성을 살펴보고, sequential forward selection 방법을 적용하여 최종 감정인식 성능을 분석하였다. 결과적으로, 제안된 피쳐들으로부터 성능이 향상되는 것을 확인하였고, 특히 화남과 기쁨에 대하여 에러가 줄어드는 것을 확인하였다. 또한 음색관련 피쳐들이 cepstral 피쳐와 결합할 경우 역시 인식 성능이 향상되었다.
본 연구는 감정상태와 음색특성의 관계를 확인하고, 추가로 cepstral 피쳐와 조합하여 감정인식을 진행하였다. Open quotient, harmonic-to-noise ratio, spectral tilt, spectral sharpness를 포함하는 특징들을 음색검출을 위해 적용하였고, 일반적으로 사용되는 피치와 에너지를 기반한 운율피쳐를 적용하였다. ANOVA분석을 통해 각 특징벡터의 유효성을 살펴보고, sequential forward selection 방법을 적용하여 최종 감정인식 성능을 분석하였다. 결과적으로, 제안된 피쳐들으로부터 성능이 향상되는 것을 확인하였고, 특히 화남과 기쁨에 대하여 에러가 줄어드는 것을 확인하였다. 또한 음색관련 피쳐들이 cepstral 피쳐와 결합할 경우 역시 인식 성능이 향상되었다.
This study investigates the relationship between voice quality measurements and emotional states, in addition to conventional prosodic and cepstral features. Open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy were analyzed as voice quality features, and proso...
This study investigates the relationship between voice quality measurements and emotional states, in addition to conventional prosodic and cepstral features. Open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy were analyzed as voice quality features, and prosodic features related to fundamental frequency and energy are also examined. ANOVA tests and Sequential Forward Selection are used to evaluate significance and verify performance. Classification experiments show that using the proposed features increases overall accuracy, and in particular, errors between happy and angry decrease. Results also show that adding voice quality features to conventional cepstral features leads to increase in performance.
This study investigates the relationship between voice quality measurements and emotional states, in addition to conventional prosodic and cepstral features. Open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy were analyzed as voice quality features, and prosodic features related to fundamental frequency and energy are also examined. ANOVA tests and Sequential Forward Selection are used to evaluate significance and verify performance. Classification experiments show that using the proposed features increases overall accuracy, and in particular, errors between happy and angry decrease. Results also show that adding voice quality features to conventional cepstral features leads to increase in performance.
* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.
문제 정의
The features used in previous studies are insufficient to classify the valence of emotions, though activation of emotion is easily classified. This study focused on the importance of the voice quality to overcome limitations of prosodic features.
This letter aims to expand upon analyses of the relationship between voice quality measurements and emotions. There are three main objectives of this study.
This study presents an investigation of the relationship between voice quality and emotions, and useful feature measurements. For voice quality features, open quotient, harmonics-to-noise ratio, spectral tilt, spectral sharpness, and band energy features are considered.
제안 방법
Second, we attempt to quantitatively identify useful measurements for classifying different emotions. The last objective is to apply these measurements to improve emotion classification rate, using various combinations of features.
The second experiment uses MFCCs and voice quality features within a frame-based scheme. Since spectral information is contained in MFCCs, spectral tilt and band energies are not used in this experiment.
대상 데이터
Utterances of five female and five male speakers, totalling 1486 utterances, are used as the analysis set. The rest of the data, including 2969 utterances from 10 male and 10 female speakers.
MFCC24 features include the 0th through 7th coefficients with their first and second derivatives. The proposed feature uses F0, OQ, and HNR features with MFCCs.
이론/모형
The speech material used in this study is the Korean Emotion Corpus developed by Kang et al [2]. It includes four emotional states, happy, sad, angry and neutral, recorded by 15 actresses and 15 actors in Korean.
참고문헌 (8)
R. Cowie, E. Douglas-Cowei, N. Tsapatsoulis, G. Votsis, S. Kollias, W. Fellenz, and J. G. Taylor, "Emotion Recognition in Human Computer Interaction," IEEE Signal Processing Magazine, pp. 32-80, 2001.
B.-S. Kang, "Text independent emotion recognition using speech signals," M. S. Thesis, Yonsei university, 2000.
I. Murray, J. Arnott, "Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion," J. Acoust. Soc. Am, vol. 93 (2), pp. 1097-1108, 1993.
H.-S. Kwak, S.-H. Kim, Y.-K. Kwak, "Emotion recognition using prosodic feature vector and Gaussian mixture model," Korean Soc. for Noise and Vibration Eng, pp. 762-765, 2002.
S. Yacoub, S. Simske, X. Lin, J. Burns, "Recognition of Emotionsin Interactive Voice Response System," Proceedings of the Eurospeech 2003, Geneva, 2003.
J.-Y. Choi, M. Hasegawa-Johnson, J. Cole, "Finding intonational boundaries using acoustic cues related to the voice source," J. Acout. Soc. Am. vol. 118 (4), p. 2579-2587, 2005.
G. de Krom, "A Cepstrum-based technique for determining a Harmonic-to-Noise ratio in speech signals," J. Speech Hearing Res. vol. 36, pp. 254-266, 1993.
P. Pudil, F. J. Ferri, J. Novovicova, J. Kittler, "Floating Search Methods for Feature Selection with Nonmonotonic Criterion Functions," Proceedings of the IEEE International Conference on Pattern Recognition, vol. 2, pp. 279-283, Jerusalem, 1994.
이 논문을 인용한 문헌
저자의 다른 논문 :
활용도 분석정보
상세보기
다운로드
내보내기
활용도 Top5 논문
해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다. 더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.