[논문]음향 이벤트 검출을 위한 DenseNet-Recurrent Neural Network 학습 방법에 관한 연구

차현진; 박상욱

doi:10.7776/ask.2023.42.5.395

음향 이벤트 검출을 위한 DenseNet-Recurrent Neural Network 학습 방법에 관한 연구
A study on training DenseNet-Recurrent Neural Network for sound event detection 원문보기

한국음향학회지= The journal of the acoustical society of Korea, v.42 no.5, 2023년, pp.395 - 401

차현진 (국립강릉원주대학교 전자공학과) , 박상욱 (국립강릉원주대학교 전자공학과)

초록
AI-Helper

음향 이벤트 검출(Sound Event Detection, SED)은 음향 신호에서 관심 있는 음향의 종류와 발생 구간을 검출하는 기술로, 음향 감시 시스템 및 모니터링 시스템 등 다양한 분야에서 활용되고 있다. 최근 음향 신호 분석에 관한 국제 경연 대회(Detection and Classification of Acoustic Scenes and Events, DCASE) Task 4를 통해 다양한 방법이 소개되고 있다. 본 연구는 다양한 영역에서 성능 향상을 이끌고 있는 Dense Convolutional Networks(DenseNet)을 음향 이벤트 검출에 적용하기 위해 설계 변수에 따른 성능 변화를 비교 및 분석한다. 실험에서는 DenseNet with Bottleneck and Compression(DenseNet-BC)와 순환신경망(Recurrent Neural Network, RNN)의 한 종류인 양방향 게이트 순환 유닛(Bidirectional Gated Recurrent Unit, Bi-GRU)을 결합한 DenseRNN 모델을 설계하고, 평균 교사 모델(Mean Teacher Model)을 통해 모델을 학습한다. DCASE task4의 성능 평가 기준에 따라 이벤트 기반 f-score를 바탕으로 설계 변수에 따른 DenseRNN의 성능 변화를 분석한다. 실험 결과에서 DenseRNN의 복잡도가 높을수록 성능이 향상되지만 일정 수준에 도달하면 유사한 성능을 보임을 확인할 수 있다. 또한, 학습과정에서 중도탈락을 적용하지 않는 경우, 모델이 효과적으로 학습됨을 확인할 수 있다.

Abstract ▼ AI-Helper

Sound Event Detection (SED) aims to identify not only sound category but also time interval for target sounds in an audio waveform. It is a critical technique in field of acoustic surveillance system and monitoring system. Recently, various models have introduced through Detection and Classification of Acoustic Scenes and Events (DCASE) Task 4. This paper explored how to design optimal parameters of DenseNet based model, which has led to outstanding performance in other recognition system. In experiment, DenseRNN as an SED model consists of DensNet-BC and bi-directional Gated Recurrent Units (GRU). This model is trained with Mean teacher model. With an event-based f-score, evaluation is performed depending on parameters, related to model architecture as well as model training, under the assessment protocol of DCASE task4. Experimental result shows that the performance goes up and has been saturated to near the best. Also, DenseRNN would be trained more effectively without dropout technique.

주제어

표/그림 (6)

그림 Fig. 1. (Color available online) (a) Proposed DenseRNN structure for sound event detection and (b) bottleneck block.
그림 Fig. 2. (Color available online) (a) The event-based class-averaging f-score and (b) the total number of parameters in DenseRNN, both as functions of growth rate and depth.
그림 Fig. 3. (Color available online) The Event-based class averaging f-score as a function of dropout rate.
그림 Fig. 4. (Color available online) The Event-based class averaging f-score as a function of (a) max consistency weight and (b) max learning rate.
표 Table 1. The Event-based class averaging f-score as a function of optimizers.
표 Table 2. The Event-based class averaging f-score as a function of weight initialization methods.

참고문헌 (21)

L. Delphin-Poulat and C. Plapous, "Mean teacher with data augmentation for dcase 2019 task 4," Orange Labs Lannion, Tech. Rep., 2019.？
A. Tarvainen and H. Valpola, "Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results," Proc. NIPS, 1-10 (2017).？
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE Conf. on CVPR, 770-778 (2016).？
K. He, X. Zhang, S. Ren, and J. Sun, "Identity mappings in deep residual networks," Proc. Computer Vision-ECCV, 1-15 (2016).？
S. Zagoruyko and N. Komodakis, "Wide residual networks," arXiv preprint arXiv:1605.07146 (2016).？
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, "Densely connected convolutional networks," Proc. Computer Vision-ECCV, 1-9 (2017).？
B. McMahan and D. Rao, "Listening to the world improves speech command recognition," Proc. AAAI Conf. on Artificial Intelligence, 378-385 (2018).？
K. Palanisamy, D. Singhania, and A. Yao, "Rethinking CNN models for audio classification," arXiv preprint arXiv:2007.11154 (2020).？
PyTorch Torch.nn.GRU, https://pytorch.org/docs/stable/generated/torch.nn.GRU.html, (Last viewed February 12, 2023).？
A PyTorch Implementation for Densely Connected Convolutional Networks (DenseNets), https://github.com/andreasveit/densenet-pytorch, (Last viewed February 12, 2023).？
S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167 (2015).？
X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks," Proc. of the 14th International Conf. on Artificial Intelligence and Statistics, 315-323 (2011).？
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," J. Mach. Learn. Res. 15, (2014).？

상세보기
DCASE 2020 Task 4 GitHub, https://github.com/turpaultn/dcase20_task4, (Last viewed February 12, 2023).？
N. Turpault, R. Serizel, A. Shah, and J. Salamon, "Sound event detection in domestic environments with weakly labeled data and soundscape synthesis," Proc. DCASE Workshop, 253-257 (2019).？
S. Park and M. Elhilali, "Time-balanced focal loss for audio event detection," Proc. ICASSP, 311-315 (2022).？
X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," Proc. 13th International Conf. on Artificial Intelligence and Statistics, 249-256 (2010).？
K. He, X. Zhang, S. Ren, and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification," Proc. IEEE ICCV, 1026-1032 (2015).？
DCASE 2020 Task 4: Sound Event Detection and Separation in Domestic Environments, https://dcase.community/challenge2020/task-sound-event-detection-and-separation-in-domestic-environments, (Last viewed July 25, 2023).？
A. Mesaros, T. Heittola, T. Virtanen, and M. D. Plumbley, "Sound event detection: A tutorial," IEEE Signal Process. Mag. 38, 67-83 (2021).？

상세보기
D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980 (2014).

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증