[논문]심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘

김범준; 문현기; 박성욱; 정영호; 박영철

doi:10.5909/jbe.2019.24.3.472

초록
AI-Helper

본 논문에서는 심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘을 제시한다. 본 시스템에서는 주파수 영역으로 변환되지 않은 시간 영역의 음향 데이터를 심층신경망의 입력으로 사용한다. 전반적인 구조는 CRNN 구조를 사용하였으며, GLU, ResNet, Squeeze-and-excitation 블럭을 적용하였다. 그리고 여러 계층에서 추출된 특징을 함께 고려하는 구조를 제안하였다. 또한 본 연구에서는 강한 라벨이 있는 훈련 데이터를 확보하는 것이 현실적으로 어렵다는 전제 아래에서 약한 라벨이 있는 훈련 데이터 약간 그리고 다수의 라벨이 없는 훈련 데이터를 활용하여 훈련을 수행하였다. 적은 수의 훈련 데이터를 효과적으로 사용하기 위해 타임 스트레칭, 피치 변화, 동적 영역 압축, 블럭 혼합 등의 데이터 증강 방법을 적용하였다. 라벨이 없는 데이터에는 의사 라벨을 붙여 부족한 훈련 데이터를 보완하였다. 본 논문에서 제안한 신경망과 데이터 증강 방법을 사용하는 경우, 종래의 방식으로 CRNN 구조의 신경망을 훈련하여 사용하는 경우보다, 음향 이벤트 검출 성능이 약 6 % (f-score 기준)가 개선되었다.

Abstract ▼ AI-Helper

This paper proposes a time-domain sound event detection algorithm using DNN (Deep Neural Network). In this system, time domain sound waveform data which is not converted into the frequency domain is used as input to the DNN. The overall structure uses CRNN structure, and GLU, ResNet, and Squeeze-and...

This paper proposes a time-domain sound event detection algorithm using DNN (Deep Neural Network). In this system, time domain sound waveform data which is not converted into the frequency domain is used as input to the DNN. The overall structure uses CRNN structure, and GLU, ResNet, and Squeeze-and-excitation blocks are applied. And proposed structure uses structure that considers features extracted from several layers together. In addition, under the assumption that it is practically difficult to obtain training data with strong labels, this study conducted training using a small number of weakly labeled training data and a large number of unlabeled training data. To efficiently use a small number of training data, the training data applied data augmentation methods such as time stretching, pitch change, DRC (dynamic range compression), and block mixing. Unlabeled data was supplemented with insufficient training data by attaching a pseudo-label. In the case of using the neural network and the data augmentation method proposed in this paper, the sound event detection performance is improved by about 6 %(based on the f-score), compared with the case where the neural network of the CRNN structure is used by training in the conventional method.

주제어

표/그림 (13)

그림 그림 1. 제안된 DNN 구조의 블럭 다이어그램, (a) 의사 라벨 DNN 구조, (b) SED DNN 구조 Fig. 1. Block diagram of the proposed DNN structure, (a) DNN structure for pseudo label, (b) DNN structure for SED
그림 그림 2 . N 스트라이드 1차원 합성곱 계층의 블럭 다이어그램 Fig. 2. Block diagram of the 1D convolution layer with stride N
그림 그림 3. ResGLU-SE 블럭의 블럭 다이어그램 Fig. 3. Block diagram of the ResGLU-SE block
그림 그림 4. DRC 커브 예제(위), 사용된 DRC 커브(아래) Fig. 4. A DRC curve example (above) and the DRC curves used (below)
그림 그림 5. 의사 라벨이 적용된 훈련 블럭 다이어그램 Fig. 5. Block diagram of pseudo label applied training
표 표 1. 이벤트 별 수 (약한 라벨) Table 1. Number of clips per event (weak label)
표 표 2. 데이터 증강 비율과 증강된 클립의 수 Table 2. Ratio of data augmentation and resultant number of augmented clips
그림 그림 6. 전반적인 구조의 블럭 다이어그램 Fig. 6. Block diagram for overall structure
그림 그림 7. 스트라이드된 1차 합성곱 계층의 크기 스펙트럼, (a) GLUs 블럭, (b) ResGLU-SE 블럭 Fig. 7. Magnitude spectrum of the strided 1D convolutional layers, (a) Using GLUs block, (b) Using ResGLU-SE block
그림 그림 8. 제안된 알고리즘의 이벤트 별 성능 Fig. 8. Performance of the proposed algorithm per event
표 표 3. 음향 이벤트 검출 성능 Table 3. Performance of sound event detection
그림 그림 9. 훈련 데이터의 스펙트럼 Fig. 9. Spectrum of training data
그림 그림 10. 정답과 추정 결과 Fig. 10. Ground truth and prediction results

질의응답

핵심어	질문	논문에서 추출한 답변
	적은 수의 훈련 데이터를 효과적으로 사용하기 위해 어떤 방법을 적용했는가?	또한 본 연구에서는 강한 라벨이 있는 훈련 데이터를 확보하는 것이 현실적으로 어렵다는 전제 아래에서 약한 라벨이 있는 훈련 데이터 약간 그리고 다수의 라벨이 없는 훈련 데이터를 활용하여 훈련을 수행하였다. 적은 수의 훈련 데이터를 효과적으로 사용하기 위해 타임 스트레칭, 피치 변화, 동적 영역 압축, 블럭 혼합 등의 데이터 증강 방법을 적용하였다. 라벨이 없는 데이터에는 의사 라벨을 붙여 부족한 훈련 데이터를 보완하였다.
	음향 이벤트 검출이란?	가정에서 음성으로 대화를 나눌 수 있는 인공지능 스피커가 다량 보급되고 있으며, 이러한 인공지능 스피커는 음성과 함께 주변 음향을 분석하여 상황을 이해하고, 이를 바탕으로 향상된 서비스를 제공하고자 시도하고 있다. 상황을 이해하기 위해서는 입력되는 주변 음향 신호를 분석하여 특정 시간에 어떤 사건(즉, 이벤트)이 있는지를 인식하는 기능이 필요하며, 이를 음향 이벤트 검출(SED: sound event detection)라고 한다[1]. 음향 이벤트 검출 기능을 이용한다면 구조 요청, 긴급 출동, 정보 검색[2]과 같은 서비스를 제공할 수 있다.
	심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘은 어떤 구조를 사용하는가?	본 시스템에서는 주파수 영역으로 변환되지 않은 시간 영역의 음향 데이터를 심층신경망의 입력으로 사용한다. 전반적인 구조는 CRNN 구조를 사용하였으며, GLU, ResNet, Squeeze-and-excitation 블럭을 적용하였다. 그리고 여러 계층에서 추출된 특징을 함께 고려하는 구조를 제안하였다.

참고문헌 (14)

Mesaros, A., Heittola, T, and Virtanen, T, "TUT database for acoustic scene classification and sound event detection," 2016 24th EUSIPCO, Hungary, Budapest, pp.1128-1132, August 2016.
E. Wold, T. Blum, D. Keislar, and J. Wheaten, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, Vol.3, No.3, pp.27-36, 1996.

상세보기
DENG, Ltsc, et al. "Recent advances in deep learning for speech research at Microsoft," In ICASSP, Vol. 26, pp. 64, May 2013.
Mun, Seongkyu, et al. "Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane," Proceeding of DCASE, pp.93-97, 2017.
Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," arXiv preprint arXiv preprint arXiv:1612.08083, 2016.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, "Identity mappings in deep residual networks," In European Conference on Computer Vision (ECCV). Springer, pp.630-645, 2016.
J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," arXiv preprint arXiv:1709.01507, 2017.
Hyeongi Moon, Joon Byun, Bum-Jun Kim, Shin-hyuk Jeon, Youngho Jeong, Young-cheol Park and Sung-wook Park, "End-to-end CRNN Architectures for Weakly Supervised Sound Event Detection," DCASE 2018 Challenge, Sep. 2018.
Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals, "Learning the speech front-end with raw waveform CLDNNs," Procedding of INTERSPEECH, Germany, Dresden, September 2015.
Yong Xu, Qiuqiang Kong, Wenwu Wang and Mark D. Plumbley, "Large-scale weakly supervised audio classification using gated convolutional neural network," Proceeding of ICASSP, Canada, Calgary, pp.121-125, April 2018.
Justin Salamon and Juhan Pablo Bello, "Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification," IEEE Signal Processing Letters, pp.279-283, 2017
J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter, "Audio set: An ontology and human-labeled dataset for audio events," Proceeding of ICASSP, USA, New Orleans, pp.776-780, March 2017.
Mesaros, Annamaria, Toni Heittola, and Tuomas Virtanen, "Metrics for polyphonic sound event detection," Applied Sciences, 6.6: 162, 2016.

상세보기
Romain Serizel, Nicolas Turpault, Hamid Eghbal-Zadeh, Ankit Parag Shah, "Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments," arXiv preprint arXiv:1807.10501, 2018.

이 논문을 인용한 문헌

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘
Time-domain Sound Event Detection Algorithm Using Deep Neural Network 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (13)

표/그림 (13)

질의응답

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘 Time-domain Sound Event Detection Algorithm Using Deep Neural Network 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (13) 모든 표/그림 보기

표/그림 (13) 슬라이드로 보기

질의응답

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

김범준 (1) 문현기 (1) 박성욱 (1) 박영철 (88)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘
Time-domain Sound Event Detection Algorithm Using Deep Neural Network 원문보기

초록
AI-Helper

표/그림 (13)

표/그림 (13)