[논문]공동 행렬대각화 조건 기반 온라인 음원 신호 분리 및 잔향제거

유호건; 김도희; 송민환; 박형민

doi:10.7776/ask.2021.40.5.503

공동 행렬대각화 조건 기반 온라인 음원 신호 분리 및 잔향제거
Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint 원문보기

한국음향학회지= The journal of the acoustical society of Korea, v.40 no.5, 2021년, pp.503 - 514

유호건 (서강대학교 전자공학과) , 김도희 (서강대학교 전자공학과) , 송민환 (한국전자기술연구원 자율지능IoT연구센터) , 박형민 (서강대학교 전자공학과)

초록
AI-Helper

신호에서의 잔향은 암묵음원분리 시스템의 성능을 크게 저하시키는 경향이 있다. 특히 온라인으로 진행되는 시스템일 때, 그 영향이 더욱 두드러진다. 최근 공동 행렬대각화를 활용하여 해당 문제를 해결하고자 하는 연구들이 이루어지고 있다. 본 논문에서는 이를 활용, 발전하여 잔향이 존재하는 환경에서의 미결정 다중 화자의 음원 분리 온라인 알고리즘에 잔향 제거 기능을 추가함으로써 분리한 음원의 품질을 개선하였다. WSJCAM0 데이터베이스에서 실험을 통해 기존에 사용되고 있는 온라인 알고리즘 성능과 비교하였다. 성능 평가는 신호 대 왜곡 비(Signal-to-Distortion Ratio, SDR)와 Perceptual Evaluation of Speech Quality(PESQ)를 통해 이루어졌고, 기존 알고리즘 대비 SDR은 평균 1.23 dB에서 3.76 dB로 향상되었고, PESQ는 1.15에서 2.12로 성능이 향상되었음을 검증하였다.

Abstract ▼ AI-Helper

Reverberation in speech signals tends to significantly degrade the performance of the Blind Source Separation (BSS) system. Especially in online systems, the performance degradation becomes severe. Methods based on joint diagonalizability constraints have been recently developed to tackle the problem. To improve the quality of separated speech, in this paper, we add the proposed de-reverberation method to the online BSS algorithm based on the constraints in reverberant environments. Through experiments on the WSJCAM0 corpus, the proposed method was compared with the existing online BSS algorithm. The performance evaluation by the Signal-to-Distortion Ratio and the Perceptual Evaluation of Speech Quality demonstrated that SDR improved from 1.23 dB to 3.76 dB and PESQ improved from 1.15 to 2.12 on average.

주제어

표/그림 (7)

표 Table 1. Glossary and definition of variables.
$Fig. 1. Diagonalizer matix <TEX>$\hat{P}_f$</TEX> and observed mixture <TEX>$\hat{x}_{f,t}$</TEX> structure.$ 그림 Fig. 1. Diagonalizer matix $\hat{P}_f$ and observed mixture $\hat{x}_{f,t}$ structure.
그림 Fig. 2. (Color available online) Recording conditions of impulse response obtained from image method.
표 Table 2. Source separation performance in terms of SDR, PESQ according to reverberation time.
그림 Fig. 3. Online source separation performance according to late-reverberation and early reflection.
그림 Fig. 4. Online source separation performance over time.
그림 Fig. 5. (Color available online) Spectrogram of (a) a reverberant mixture, spectrogram of (b) a clean signal and spectrograms of separated signals obtained by (c) online IVA and (d) proposed method.

참고문헌 (24)

P. Smaragdis,"Blind separation of convolve mixtures in the frequency domain," Neurocomput. 22, 21-34 (1998).

상세보기
T. Kim, H. T. Attias, S.-Y. Lee, and T.-W. Lee, "Blind source separation exploiting higher order frequency dependencies," IEEE Trans. ASLP. 15, 70-79 (2007).
N. Ono, "Stable and fast update rules for independent vector analysis based on auxiliary function technique," Proc. IEEE Workshop Appl. Signal Process. Audio Acoust. 189-192 (2011).
N. Ono and S. Miyabe, "Auxiliary-function-based independent component analysis for super-Gaussian sources," Proc. Int. Conf. Latent Variable Anal. Signal Separation, 165-172 (2010).
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari, "Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization," IEEE/ACM Trans. ASLP. 24, 1626-1641 (2016).
T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang, "Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation," Proc. ICASSP. 85-88 (2008).
T. Yoshioka and T. Nakatani, "Generalization of multichannel linear prediction methods for blind MIMO impulse response shortening," IEEE Trans. Audio, Speech Lang. Process. 20, 2707-2720 (2012).

상세보기
T. Nakatani, C. Boeddeker, K. kinoshita, R. Ikeshita, M. Delcroix, and R. Haeb-Umbach, "Jointly optimal denoising, dereverberation, and source separation," IEEE/ACM Trans. ASLP. 28, 2276-2282 (2020).
R. Ikeshitam N. Ito, Nakatani, and H. Sawada, "A unifying framework for blind source separation based on a joint diagonalizability constraint," Proc. Eur. Signal Process. Conf. 1-5 (2019).
R. Ikeshita, N. Ito, T.Nakatani, and H. Sawada, "Independent low-rank matrix analysis with decorrelation learning," Proc. IEEE WASPAA. 288-292 (2019).
K. Sekiguchi, Y. Bando, A. Nugraha, K. Yoshiim, and T. Kawahara, "Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation," IEEE/ACM Trans. ASLP. 28, 2610-2625 (2020).
M. T. Akhtar, T.-P. Jung, S. Makeig, and G. Cauwenberghs, "Recursive independent component analysis for online blind source separation," IEEE Int. Symp. Circuits Syst. 6, 2813-2816 (2012).
T. Taniguchi, N. Ono, A. Kawamata, and S. Sagayama, "An auxiliary-function approach to online independent vector analysis for real-time blind source separation," Proc. HSCMA. 107-111 (2014).
S.-H. Hsu, T. Mullen, T.-P. Jung, and G. Cauwenberghs, "Online recursive independent component analysis for real-time source separation of high-density EEG," Proc. IEEE Eng. Med. Biol. Soc. Conf. 3845-3848 (2014).
T. Yoshioka and T. Nakatani, "Dereverberation for reverberation-robust microphone arrays," Proc. Eur. Signal Process. Conf. 1-5 (2013).
T. Nakatani and K. Kinoshita, "A unified convolutional beamformer for simultaneous denoising and dereverberation," IEEE Signal Processing Letters, 26, 903-907 (2019).

상세보기
S.-I. Amari, A. Cichocki, and H. H. Yang, "A new learning algorithm for blind signal separation," Adv. Neural Inf. Process. Syst. 8, 752-763 (1996).
M. Woodbury, "Inverting modified matrices," Memorandum Rep. 42, MR0038136 (1950).
E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source," IEEE Trans. Audio, Speech, and Lang. Process. 14, 1462-1469 (2006).

상세보기
A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (PESQ)-A new method for speech quality assessment of telephone networks and codecs," Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2, 749-752 (2001).
T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, "WSJCAM0: A british english speech corpus for large vocabulary continuous speech recognition," Proc. ICASSP. 81-84 (1995).
J. B. Allen and D. A. Berkley, "Image method for efficiently simulating small-room acoustics," J. Acoust. Soc. Am. 65, 943-950 (1979).

상세보기
S. Bradley, H. Sato, and M. Picard, "On the importance of early reflections for speech in rooms," J. Acoust. Soc. Am. 113, 3233-3244 (2003).

상세보기
T. Nishiura, Y. Hirano, Y. Denda, and M. Nakayama, "Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria," Proc. Interspeech, 1082-1085 (2007).

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증