[논문]잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지

김남균; 박창수; 김홍국; 허진욱; 임정은

doi:10.7776/ask.2021.40.5.479

잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지
Sound event detection model using self-training based on noisy student model 원문보기

한국음향학회지= The journal of the acoustical society of Korea, v.40 no.5, 2021년, pp.479 - 487

김남균 (광주과학기술원 전기전자컴퓨터공학부) , 박창수 (광주과학기술원 전기전자컴퓨터공학부) , 김홍국 (광주과학기술원 전기전자컴퓨터공학부) , 허진욱 (한화테크윈 AI연구소) , 임정은 (한화테크윈 AI연구소)

초록
AI-Helper

본 논문에서는 잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지 기법을 제안한다. 제안된 음향 사건 검지 모델은 두 단계로 구성된다. 첫 번째 단계에서는 잔차 합성곱 순환 신경망(Residual Convolutional Recurrent Neural Network, RCRNN)을 훈련하여 레이블이 지정되지 않은 비표기 데이터셋의 레이블 예측에 활용한다. 두 번째 단계에서는 세 가지 잡음 종류를 적용한 잡음 학생 모델을 자가학습 기법으로 반복하여 학습한다. 여기서 잡음 학생 모델은 SpecAugment, Mixup, 시간-주파수 이동을 활용한 특징 잡음, 드롭아웃을 활용한 모델 잡음, 그리고 semi-supervised loss function을 적용한 레이블 잡음을 활용하여 학습된다. 제안된 음향 사건 검지 모델의 성능은 Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge Task 4의 validation set으로 평가하였다. DCASE 2020 챌린지 데이터셋의 baseline 및 최상위 랭크된 모델과 이벤트 단위 F1 점수 성능을 비교한 결과, 제안된 음향 사건 검지 모델이 단일 모델과 앙상블 모델에서 최상위 모델 대비 F1 점수를 각각 4.6 %와 3.4 % 향상시켰다.

Abstract ▼ AI-Helper

In this paper, we propose an Sound Event Detection (SED) model using self-training based on a noisy student model. The proposed SED model consists of two stages. In the first stage, a mean-teacher model based on an Residual Convolutional Recurrent Neural Network (RCRNN) is constructed to provide target labels regarding weakly labeled or unlabeled data. In the second stage, a self-training-based noisy student model is constructed by applying different noise types. That is, feature noises, such as time-frequency shift, mixup, SpecAugment, and dropout-based model noise are used here. In addition, a semi-supervised loss function is applied to train the noisy student model, which acts as label noise injection. The performance of the proposed SED model is evaluated on the validation set of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4. The experiments show that the single model and ensemble model of the proposed SED based on the noisy student model improve F1-score by 4.6 % and 3.4 % compared to the top-ranked model in DCASE 2020 challenge Task 4, respectively.

주제어

표/그림 (5)

그림 Fig. 1. (Color available online) Training procedure of the proposed sound event detection model composed of the RCRNN-based mean-teacher model for predicting strong labels and the self-trained noisy student model with noise injections and a semi-supervised loss function.
표 Table 1. Network architecture of a residual convolutional neural network in the RCRNN used in the mean-teacher model.
표 Table 2. Comparison of F1-score and ERs between the top-ranked sound event detection model and the proposed RCRNN-based noisy student sound event detection model.
표 Table 3. Comparison of F1-score and ERs between the top-ranked ensemble sound event detection model and the proposed RCRNN-based noisy student ensemble sound event detection model.
표 Table 4. Ablation study for the proposed noisy student sound event detection model using an RCRNN-based teacher model with different types of noise injections.

참고문헌 (24)

T. Virtanen, M. D. Plumbley, and D. Ellis, Computational Analysis of Sound Scenes and Events (Springer, Heidelberg, 2018), Chap. 1.
J. P. Bello, C. Silva, O. Nov, R. L. Dubois, A. Arora, J. Salamon, C. Mydlarz, and H. Doraiswamy, "SONYC: A system for monitoring, analyzing and mitigating urban noise pollution," Commun. ACM. 62, 68-77 (2019).

상세보기
K. Drossos, S. Adavanne, and T. Virtanen, "Automated audio captioning with recurrent neural networks," Proc. IEEE WASPAA. 374-378 (2017).
Y. Zigel, D. Litvak, and I. Gannot, "A method for automatic fall detection of elderly people using floor vibrations and sound -Proof of concept on human mimicking doll falls," IEEE Trans. Biomed. Eng. 56, 2858-2867 (2009).

상세보기
A. Temko and C. Nadeu, "Acoustic event detection in meeting-room environments," Pattern Recognit. Lett. 30, 1281-1288 (2009).

상세보기
A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, "Acoustic event detection in real life recordings," Proc. EUSIPCO. 1267-1271 (2010).
T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, "Context-dependent sound event detection," EURASIP J. Audio, Speech, and Music Process. 2013, 1-13 (2013).

상세보기
E. Cakir, T. Heittola, H. Huttunen, and T. Virtanen, "Polyphonic sound event detection using multi label deep neural networks," Proc. IJCNN. 1-7 (2015).
H. Zhang, I. McLoughlin, and Y. Song, "Robust sound event recognition using convolutional neural networks," Proc. IEEE ICASSP. 559-563 (2015).
H. Phan, L. Hertel, M. Maass, and A. Mertins, "Robust audio event recognition with 1-max pooling convolutional neural networks," Proc. Interspeech, 3653-3657 (2016).
G. Parascandolo, H. Huttunen, and T. Virtanen, "Recurrent neural networks for polyphonic sound event detection in real life recordings," Proc. IEEE ICASSP. 6440-6444 (2016).
E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, "Convolutional recurrent neural networks for polyphonic sound event detection," IEEE/ACM Trans. on Audio, Speech, Lang. Process. 25, 1291-1303 (2017).

상세보기
S. Adavanne, P. Pertila, and T. Virtanen, "Sound event detection using spatial features and convolutional recurrent neural network," Proc. IEEE ICASSP. 771-775 (2017).
N. Turpault, R. Serizel, A. Shah, and J. Salamon, "Sound event detection in domestic environments with weakly labeled data and soundscape synthesis," Proc. Workshop on DCASE. 253-257 (2019).
N. Turpault, R. Serizel, S. Wisdom, H. Erdogan, J. R. Hershey, E. Fonseca, P. Seetharaman, and J. Salamon, "Sound event detection and separation: A benchmark on DESED synthetic soundscapes," Proc. IEEE ICASSP. 840-844 (2021).
D. Stowell, D. Giannoulis, E. Benetos, M. Lagrange, and M. D. Plumbley, "Detection and classification of acoustic scenes and events," IEEE Trans. Multimedia, 17, 1733-1746 (2015).

상세보기
N. K. Kim and H. K. Kim, "Polyphonic sound event detection based on residual convolutional recurrent neural network with semi-supervised loss function," IEEE Access, 9, 7564-7575 (2021).

상세보기
Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, "Self-training with noisy student improves ImageNet classification," Proc. IEEE/CVF CVPR. 10687-10698 (2020).
D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, "Specaugment: A simple data augmentation method for automatic speech recognition," arXiv preprint, arXiv:1904.08779 (2019).
H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, "Mixup: Beyond empirical risk minimization," arXiv preprint, arXiv:1710.09412 (2017).
L. Delphin-Poulat and C. Plapous, "Mean teacher with data augmentation for DCASE 2019 Task 4," DCASE 2019 Challenge, Tech. Rep., 2019.
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: Convolu tional block attention modu le," Proc. ECCV. 3-19 (2018).
A. Mesaros, T. Heittola, and T. Virtanen, "Metrics for polyphonic sound event detection," Appl. Sci. 6, 162-178 (2016).

상세보기
K. Miyazaki, T. Komatsu, T. Hayashi, S. Watanabe, T. Toda, and K. Takeda "Convolution augmented transformer for semi-supervised sound event detection," Proc. Workshop on DCASE. 100-104 (2020).

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지
Sound event detection model using self-training based on noisy student model 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (5)

표/그림 (5)

참고문헌 (24)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지 Sound event detection model using self-training based on noisy student model 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (5) 모든 표/그림 보기

표/그림 (5) 슬라이드로 보기

참고문헌 (24)

이 논문을 인용한 문헌

저자의 다른 논문 :

박창수 (10) 김홍국 (24)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지
Sound event detection model using self-training based on noisy student model 원문보기

초록
AI-Helper

표/그림 (5)

표/그림 (5)