[논문]CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

Jang, Bumsuk; Lee, Sang-Hyun

doi:10.7236/ijasc.2020.9.2.20

CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment 원문보기

International journal of advanced smart convergence, v.9 no.2, 2020년, pp.20 - 27

Jang, Bumsuk (BS SOFT Co., LTD.) , Lee, Sang-Hyun (Department of Computer Engineering, Honam University)

Abstract ▼ AI-Helper

Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). In this paper, we proposed a deep learning model that integrates Convolution Neural Network (CNN) with Non-Negative Matrix Factorization (NMF). To improve the separation quality of the NMF, it includes noise update technique that learns and adapts the characteristics of the current noise in real time. The noise update technique analyzes the sparsity and activity of the noise bias at the present time and decides the update training based on the noise candidate group obtained every frame in the previous noise reduction stage. Noise bias ranks selected as candidates for update training are updated in real time with discrimination NMF training. This NMF was applied to CNN and Hidden Markov Model(HMM) to achieve improvement for performance of sound event detection. Since CNN has a more obvious performance improvement effect, it can be widely used in sound source based CNN algorithm.

주제어

표/그림 (5)

그림 Figure 1. Procedure of the conventional NMF-based noise reduction method.
그림 Figure 2. Procedure of the proposed NMF-based adaptive noise sensing and reduction method.
그림 Figure 3. Architectures of Convolution Neural Network
그림 Figure 4. Framework of the proposed sound event detection method based on non-negative matrix factorization (NMF)
그림 Figure 5. Spectrogram for noise reduction: left(a) and right columns(b) show spectrograms of original sound data and noise reduced sound data, respectively. And Comparison of SED performance between original data and noise reduced data: using HMM (c) and CNN (d)

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

가설 설정

[22] proposed four different CNN with a different number of layers and pooling operators and found that the nine layers CNN with max pooling operator achieved the best performance [7]. In this paper, we are interested in finding out whether with the inclusion of NMF, will a shallower CNN produce a comparable or even a better result.

제안 방법

Figure 2 shows the procedure of the proposed NMF-based adaptive noise sensing and reduction method. As shown in the figure, the procedure is divided into three different processing stages: a priori NMF basis modeling, NMF-based adaptive noise sensing, and noise reduction.
As the proposed method employs a noise estimation technique from the current input noisy signal, which also guides the derivation of both the frequency and statistical noise biases, the system can be easily adapted to different and time-varying noise conditions. Nevertheless, to ensure performance, the sound events in the training set and the development/evaluation set should better come from the same distribution.
The training data used noise and scream dataset of DCASE(2017) which is most famous worldwide Challenge of SED. Based on the training data, we implemented the adaptive NMF and evaluated the noise canceling performance and SED performance of Scream based on the field data collected assuming the actual environment. The figure 5 shows the spectrogram comparisons of the attenuated noise from the collected field data using Adaptive NMF.
However, due to the rising success of deep learning in other domains [15-18], deep learning for SED development is now a norm and has been shown to perform slightly better than established methods [1]. In this paper, we propose an adaptive noise reduction method for sound event detection based on NMF. We demonstrated the performance improvement of SED by applying NMF to HMM and CNN.
The detection method has two phases—a training phase and a test phase.
In this paper, an adaptive noise reduction method based on supervised and adapted NMF is proposed for sound event detection in non-stationary background noise environment. The proposed adaptive strategies are guided by both the prior knowledge of sound events and the results from noise estimation, which provide an additional discriminating ability to the original NMF model. For one thing, the weight of each frequency band is quantified as a trade-off between its contributions to constructing the target event class and noise.
This paper proposes a supervised and adaptive NMF framework for sound event detection, as shown in Fig 1. The input audio signals are first processed via the short-time Fourier transform (STFT), and magnitude spectrograms are used for audio signal representation.

이론/모형

In this system, training inputs are Mel-frequency scaled. This is because they can provide a reasonably good representation of signal’s spectral properties.

후속연구

In the present algorithm, an average spectral template is extracted for representing a sound event class when determining noise biases, which has limitations in dealing with the diversity of characteristics within a sound class. Future work will address the adaptation of the proposed approach with multiple templates or templates considering the temporal dynamics of sound events. In addition, another improvement of the present algorithm would be supporting it with real-time processing by using a sliding window, which would make this work more promising for practical use.
Future work will address the adaptation of the proposed approach with multiple templates or templates considering the temporal dynamics of sound events. In addition, another improvement of the present algorithm would be supporting it with real-time processing by using a sliding window, which would make this work more promising for practical use.

참고문헌 (20)

E. Cakir, G. Parascandolo, T. Heittola, H. Huttunen, and T. Virtanen, "Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 25, no. 6, pp. 1291-1303, Jun. 2017. DOI: 10.1109/TASLP.2017.2690575

상세보기
T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. L. Roux, and K. Takeda, "Duration-Controlled LSTM for Polyphonic Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 25, no. 11, pp. 2059-2070, Nov. 2017. DOI: 10.1109/TASLP.2017.2740002

상세보기
Crocco, M.; Cristani, M.; Trucco, A.; Murino, V. Audio surveillance: A systematic review. ACM Comput. Surv. 2016, 48, 52. DOI: 10.1145/2871183
Sharan, R.V.; Moir, T.J. An overview of applications and advancements in automatic sound recognition. Neurocomputing 2016, 200, 22-34. doi.org/10.1016/j.neucom.2016.03.020

상세보기
Cakir, E.; Parascandolo, G.; Heittola, T.; Huttunen, H.; Virtanen, T. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1291-1303. DOI: 10.1109/TASLP.2017.2690575

상세보기
B. McFee, J. Salamon, and J. P. Bello, "Adaptive Pooling Operators for Weakly Labeled Sound Event Detection," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 26, no. 11, pp. 2180-2193, Apr. 2018.

상세보기
S. Adavanne, P. Pertila, and T. Virtanen, "Sound Event Detection Using Spatial Features and Convolutional Recurrent Neural Network," Detection and Classification of Acoustics Scenes and Events 2017, Munich, Germany, Nov. 2017, pp. 1-5. DOI: 10.1109/ICASSP.2017.7952260
J. Lu, "Mean Teacher Convolution System For DCASE 2018 Task 4," Detection and Classification of Acoustics Scenes and Events 2018, Shanghai, China, Jul. 2018, pp. 1-5.
D. Su, X. Wu, L. Xu, "GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection," 2010 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Dallas, TX, USA, Mar. 2010, pp. 4890-4893. DOI: 10.1109/ICASSP.2010.5495122
A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen "Acoustic Event Detection in Real Life," 18th European Signal Process. Conf., Aalborg, Denmark, Aug. 2010, pp. 1267-1271.
V. Bisot, S. Essid, and G. Richard, "Overlapping Sound Event Detection with Supervised Nonnegative Matrix Factorization," 2017 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), New Orleans, LA, USA, Mar. 2017, pp. 31-35. DOI: 10.1109/ICASSP.2017.7951792
T. Komatsu, Y. Senda, and R. Kondo, "Acoustics Event Detection Based on Non-Negative Matrix Factorization With Mixtures of Local Dictionaries and Activation Aggregation," 2016 IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Shanghai, China, Mar. 2016, pp. 2259-2263. DOI: 10.1109/ICASSP.2016.7472079
Z. Md. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue, and K. Mizutani, "State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow's Intelligent Network Traffic Control Systems," IEEE Commun. Surveys Tutorials, vol. 19, no. 4, pp. 2432-2455, 2017. DOI: 10.1109/COMST.2017.2707140

상세보기
Z. Liu, Z. Jia, C. Vong, S. Bu, J. Han, and X. Tang, "Capturing High-Discriminative Fault Features for Electronics-Rich Analog System via Deep Learning," IEEE Trans. Indust. Inform., vol. 13, no. 3, pp. 1213-1226, Jun. 2017. DOI: 10.1109/TII.2017.2690940

상세보기
M. He and D. He, "Deep Learning Based Approach for Bearing Fault Diagnosis," IEEE Trans. Indust. Applications, vol. 53, no. 3, pp. 3057-3065, Jun. 2017. DOI: 10.1109/TIA.2017.2661250

상세보기
T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, "PCANet: A Simple Deep Learning Baseline for Image Classification?," IEEE Trans. Image Process., vol. 24, no. 12, pp. 5017-5032, Dec. 2015. DOI: 10.1109/TIP.2015.2475625

상세보기
A. Mesaros, T. Heittola, E. Benetos, P. Foster, M. Lagrange, T. Virtanen, and M. D. Plumbley, "Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge," IEEE/ACM Trans Audio, Speech, an Language Process., vol. 26, no. 2, pp. 379-393, Feb. 2018. DOI: 10.1109/TASLP.2017.2778423

상세보기
Q. Kong, Y. Cao, T. Iqbal, Yong Xu, W. Wang, and M. D. Plumbley, "Cross-task learning for audio-tagging, sound event detection spatial localization: DCASE 2019 baseline systems," arXiv: 1904.03476, pp. 1-5.
D. D. Lee, and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, Oct. 1999.

상세보기
Y. Xie, Z. Liu, Z. Yao, and B. Dai, "Improved two-stage Wiener filter for robust speaker identification," in Proceedings of the 18th International Conference on Pattern Recognition (ICPR '06), pp. 310-313, Hong Kong, August 2006. DOI: 10.1109/ICPR.2006.696

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증