최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기한국음향학회지= The journal of the acoustical society of Korea, v.40 no.5, 2021년, pp.515 - 522
박순찬 (부산대학교 전자공학과) , 김형순 (부산대학교 전자공학과)
It is hard to prepare sufficient training data for speech emotion recognition due to the difficulty of emotion labeling. In this paper, we apply transfer learning with large-scale training data for speech recognition on a transformer-based model to improve the performance of speech emotion recogniti...
H. Hu, M. Xu, and W. Wu, "GMM supervector based SVM with spectral features for speech emotion recognition," Proc. ICASSP. 413-416 (2007).
A. Stuhlsatz, C. Meyer, F. Eyben, T. Zielke, G. Meier, and B. Schuller, "Deep neural networks for acoustic emotion recognition: Raising the benchmarks," Proc. ICASSP. 5688-5691 (2011).
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network," Proc. ICASSP. 5200-5204 (2016).
S. Mirsamadi, E. Barsoum, and C. Zhang, "Automatic speech emotion recognition using recurrent neural networks with local attention," Proc. ICASSP. 2227-2231 (2017).
J. Kim, G. Englebienne, K. P. Truong, and V. Eversu, "Towards speech emotion recognition "in the Wild" using aggregated corpora and deep multi-task learning," Proc. Interspeech, 1113-1117 (2017).
S. Yoon, S. Byun, and K. Jung, "Multimodal speech emotion recognition using audio and text," Proc. SLT. 112-118 (2018).
Z. Lu, L. Cao, Y. Zhang, C. Chiu, and J. Fan, "Speech sentiment analysis via pre-trained features from end-to-end ASR models," Proc. ICASSP. 7149-7153 (2020).
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and L.Kaiser, "Attention is all you need," Proc. NIPS. 6000-6010 (2017).
J. Devlin, M. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," Proc. NAACL-HLT. 4171-4186 (2019).
A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "Wav2vec 2.0: A framework for self-supervised learning of speech representations," Proc. NeurIPS. 12449-12460 (2020).
C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S.l Kim, J. N. Chang, S. Lee, and S. S. Narayanan, "IEMOCAP: interactive emotional dyadic motion capture database," Language Resources and Evaluation, 42, 335-359 (2008).
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: An ASR corpus based on public domain audio books," Proc. ICASSP. 5206-5210 (2015).
W. Chan, N. Jaitly, Q. Le, and O. Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition," Proc. ICASSP. 4960-4964 (2016).
A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," Proc. ICASSP. 6645-6649 (2013).
A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks," Proc. ICML. 369-376 (2006).
S. Watanabe, T. Hori, S. Kim, J. R. Hershey, and T. Hayashi, "Hybrid CTC/attention architecture for end-to-end speech recognition," IEEE JSTSP. 11, 1240-1253 (2017).
T. Kudo and J. Richardson, "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing," Proc. EMNLP 66-71 (2018).
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "PyTorch: An imperative style, high-performance deep learning library," Proc. NeurIPS. 8024-8035 (2019).
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
오픈액세스 학술지에 출판된 논문
※ AI-Helper는 부적절한 답변을 할 수 있습니다.