[논문]Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation

Kim, Hyoung-Gook; Kim, Jin Young

doi:10.4218/etrij.16.0115.0256

Music/Voice Separation Based on Kernel Back-Fitting Using Weighted β-Order MMSE Estimation 원문보기

ETRI journal, v.38 no.3, 2016년, pp.510 - 517

Kim, Hyoung-Gook (Department of Electronics Convergence Engineering, Kwangwoon University) , Kim, Jin Young (Department of Electronics and Computer Engineering, Chonnam National University)

Abstract ▼ AI-Helper

Recent developments in the field of separation of mixed signals into music/voice components have attracted the attention of many researchers. Recently, iterative kernel back-fitting, also known as kernel additive modeling, was proposed to achieve good results for music/voice separation. To obtain minimum mean square error (MMSE) estimates of short-time Fourier transforms of sources, generalized spatial Wiener filtering (GW) is typically used. In this paper, we propose an advanced music/voice separation method that utilizes a generalized weighted ${\beta}$-order MMSE estimation (WbE) based on iterative kernel back-fitting (KBF). In the proposed method, WbE is used for the step of mixed music signal separation, while KBF permits kernel spectrogram model fitting at each iteration. Experimental results show that the proposed method achieves better separation performance than GW and existing Bayesian estimators.

주제어

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

From the complex spectrogram X of the input music signal, each complex spectrogram, S_V, S_H, and S_P, for the vocal, harmonic, and percussive components is estimated by each generalized WbE, G_V, G_H, and G_P, of decomposed spectral amplitude by singular value decomposition (SVD) for the vocal, harmonic, and percussive components, respectively. The WbE estimation gain, G_j, for each source j (= 0, 1, 2, … , J) is explained in detail in Algorithm 2.
In this paper, a generalized weighted β-order MMSE estimation (WbE) method based on kernel back-fitting (KBF) was proposed and evaluated for the separation of mixed signals into music/voice components.
In this paper, an advanced music/voice separation method is proposed, in which WbE and KBF are combined for improvement of the separation performance.
The proposed estimation method takes full advantage of both a generalized weighted β-order spectral amplitude estimator and an SVD-based subspace decomposition.

대상 데이터

For the first experiment, 150 full-length song tracks [23] were used (50 songs from the ccMixter database containing many different musical genres, 50 songs from a self-recording studio music database, and 50 songs from the MIR-1 K database), where all singing voices and music accompaniments were recorded separately. All of the song data were stored in PCM format with mono, 16-bit depth, and 44.
For the second performance comparison, the proposed algorithm, SVD-WbE-KAM, was compared with REPETSIM [26], RPCA [27], and SVD-GW-KAM. To evaluate the separation of background music and singing voice, 40 fulllength song tracks [24] were used (20 songs from the ccMixter database containing many different musical genres, and 20 songs from the MIR-1 K database). Figures 1 and 2 show boxplots of the SDR for the vocals and the music accompaniment, respectively.

성능/효과

The MMSE STSA with generalized gamma distribution (SVD-GA-KAM) outperforms the other six methods in terms of NSDR, and is slightly lower than the proposed method in terms of NSIR, since speech is well modeled by gamma distribution. However, the proposed method yields better performance than GW, LSA, bSA, and WE.
The proposed method has the following four advantages: (1) in the separation step, generalized WbE of the factorized spectral amplitude is used instead of GW for the KBF procedure to achieve better separation performances; (2) the perceptually weighted order αj and the SVD-based factorized spectral amplitude βj are adaptively calculated for effective WbE estimation performance; (3) in the back-fitting step, an SVD-based factorization procedure is applied to the power spectrogram filtered by median filter to achieve efficient compression before processing of the next source; and (4) using a back-fitting threshold, the KBF process can automatically be iteratively performed until convergence.

참고문헌 (27)

Z. Rafii and B. Pardo, "REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation," IEEE Trans. Audio, Speech, Language Process., vol. 21, no. 1, Jan. 2013, pp. 73-84.

상세보기
N.C. Maddage, C. Xu, and Y. Wang, "Singer Identification Based on Vocal and Instrumental Models," Proc. Int. Conf. Pattern Recogn., Cambridge, UK, Aug. 23-26, 2004, pp. 375-378.
M. Ryynanen and A. Klapuri, "Transcription of the Singing Melody in Polyphonic Music," Int. Conf. Music Inf. Retrieval, Victoria, Canada, Oct. 8-12, 2006, pp. 222-227.
S. Marchand et al., "DReaM: A Novel System for Joint Source Separation and Multi-track Coding," 133rd AES Conv., San Francisco, CA, USA, Oct. 26-29, 2012.
J. Nikunen, T. Virtanen, and M. Vilermo, "Multichannel Audio Upmixing Based on Non-negative Tensor Factorization Representation," IEEE Workshop Appl. Signal Process. Audio Acoust., New Paltz, NY, USA, Oct. 16-19, 2011, pp. 33-36.
U. Simsekli, Y.K. Yilmaz, and A.T. Cemgil, "Score Guided Audio Restoration via Generalized Coupled Tensor Factorisation," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 5369-5372.
J.L. Durrieu, B. David, and G. Richard, "A Musically Motivated Mid-level Representation for Pitch Estimation and Musical Audio Source Separation," IEEE J. Sel. Topics Signal Process., vol. 5, no. 6, Oct. 2011, pp. 1180-1191.

상세보기
C.L. Hsu and J.S.R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 2, Feb. 2010, pp. 310-319.

상세보기
T. Virtanen, A. Mesaros, and M. Ryynanen, "Combining Pitch-Based Inference and Non-negative Spectrogram Factorization in Separating Vocals from Polyphonic Music," ISCA Tutorial Res. Workshop Statistical Perceptual Audition, Brisbane, Australia, Sept. 21, 2008, pp. 17-22.
A. Liutkus et al., "Kernel Additive Models for Source Separation," IEEE Trans. Signal Process., vol. 62, no. 16, Aug. 2014, pp. 4298-4310.

상세보기
D. Fitzgerald, "Harmonic/Percussive Separation Using Median Filtering," Int. Conf. Digital Audio Effects, Graz, Austria, Sept. 6-10, 2010, pp. 1-4.
Z. Rafii and B. Pardo, "A Simple Music/Voice Separation Method Based on the Extraction of the Repeating Musical Structure," IEEE Int. Conf. Acoust., Speech Signal Process., Prague, Czech Republic, May 22-27, 2011, pp. 221-224.
A. Liutkus et al., "Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 53-56.
Z. Rafii and B. Pardo, "Music/Voice Separation Using the Similarity Matrix," Int. Conf. Music Inf. Retrieval, Porto, Portugal, Oct. 8-12, 2012, pp. 583-588.
O. Yilmaz and S. Rickard, "Blind Separation of Speech Mixtures via Time-Frequency Masking," IEEE Trans. Signal Process., vol. 52, no. 7, July 2004, pp. 1830-1847.

상세보기
Y. Ephraim and D. Malah, "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, Dec. 1984, pp. 1109-1121.

상세보기
E. Plourde and B. Champagne, "Auditory-Based Spectral Amplitude Estimators for Speech Enhancement," IEEE Trans. Audio, Speech, Language Process., vol. 16, no. 8, Nov. 2008, pp. 1614-1623.

상세보기
C.H. You, S.N. Koh, and S. Rahardja, " ${\beta}$ -Order MMSE Spectral Amplitude Estimation for Speech Enhancement," IEEE Trans. Speech, Audio Process., vol. 13, no. 4, July. 2005, pp. 475-486.

상세보기
F. Deng, F. Bao, and C.-C. Bao, "Speech Enhancement Using Generalized ${\beta}$ -Order Spectral Amplitude Estimator," Speech Commun., vol. 59, Apr. 2014, pp. 55-68.

상세보기
C.H. You, S.N. Koh, and S. Rahardja, "Masking-Based ${\beta}$ -Order MMSE Speech Enhancement," Speech Commun., vol. 48, no. 1, Jan. 2006, pp. 57-70.

상세보기
C.H. You, S.N. Koh, and S. Rahardja, "Improved Adaptive ${\beta}$ - Order MMSE Speech Enhancement," APSIPA Ann. Summit Conf., Sapporo, Japan, Oct. 4-7, 2009, pp. 797-800.
D.D. Greenwood, "A Cochlear Frequency-Position Function for Several Species-29 Years Later," J. Acoust. Soc. America, vol. 87, no. 6, July 1990, pp. 2592-2605.

상세보기
Multimedia Technology Laboratory homepage, Accessed Nov. 20, 2015. http://imsp.kw.ac.kr/Research.html
E. Vincent, R. Gribonval, and C. Fevotte, "Performance Measurement in Blind Audio Source Separation," IEEE Trans. Audio, Speech, Language Process., vol. 14, no. 4, July 2006, pp. 1462-1469.

상세보기
R.C. Hendriks et al., "Minimum Mean-Square Error Amplitude Estimators for Speech Enhancement under the Generalized Gamma Distribution," Int. Workshop Acoust. Echo Noise Contr., Paris, France, Sept. 12-14, 2006, pp. 1-4.
Z. Rafii, A. Liutkus, and B. Pardo, "REPET for Background/Foreground Separation in Audio," in Blind Source Separation: Advances in Theory, Algorithms and Appl., Berlin, Germany: Springer, 2014, pp. 395-411.
P.S. Huang et al., "Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis," IEEE Int. Conf. Acoust., Speech Signal Process., Kyoto, Japan, Mar. 25-30, 2012, pp. 57-60.

저자의 다른 논문 :

LOADING...

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증