[논문]A Novel Query-by-Singing/Humming Method by Estimating Matching Positions Based on Multi-layered Perceptron

Pham, Tuyen Danh; Nam, Gi Pyo; Shin, Kwang Yong; Park, Kang Ryoung

doi:10.3837/tiis.2013.07.008

A Novel Query-by-Singing/Humming Method by Estimating Matching Positions Based on Multi-layered Perceptron 원문보기

KSII Transactions on internet and information systems : TIIS, v.7 no.7, 2013년, pp.1657 - 1670

Pham, Tuyen Danh (Division of Electronics and Electrical Engineering, Dongguk University) , Nam, Gi Pyo (Division of Electronics and Electrical Engineering, Dongguk University) , Shin, Kwang Yong (Division of Electronics and Electrical Engineering, Dongguk University) , Park, Kang Ryoung (Division of Electronics and Electrical Engineering, Dongguk University)

Abstract ▼ AI-Helper

The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about the title or singer of song. Most previous researches on QbSH have been conducted using musical instrument digital interface (MIDI) files as reference songs. However, the production of MIDI files is a time-consuming process. In addition, more and more music files are newly published with the development of music market. Consequently, the method of using the more common MPEG-1 audio layer 3 (MP3) files for reference songs is considered as an alternative. However, there is little previous research on QbSH with MP3 files because an MP3 file has a different waveform due to background music and multiple (polyphonic) melodies compared to the humming/singing query. To overcome these problems, we propose a new QbSH method using MP3 files on mobile device. This research is novel in four ways. First, this is the first research on QbSH using MP3 files as reference songs. Second, the start and end positions on the MP3 file to be matched are estimated by using multi-layered perceptron (MLP) prior to performing the matching with humming/singing query file. Third, for more accurate results, four MLPs are used, which produce the start and end positions for dynamic time warping (DTW) matching algorithm, and those for chroma-based DTW algorithm, respectively. Fourth, two matching scores by the DTW and chroma-based DTW algorithms are combined by using PRODUCT rule, through which a higher matching accuracy is obtained. Experimental results with AFA MP3 database show that the accuracy (Top 1 accuracy of 98%, with an MRR of 0.989) of the proposed method is much higher than that of other methods. We also showed the effectiveness of the proposed system on consumer mobile device.

주제어

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

After this normalization step, each processed feature is applied to four separated neural networks, which are trained to estimate the four positions (in the MP3 file), namely the start and end positions for matching by the DTW and chroma-based DTW algorithms. Based on these four positions, the two DTW algorithms calculate the distances between the pitch data of the humming/singing file and those of the reference files.
989 and Top 1 accuracy of 98% compared to other methods. By matching only at the start and end positions estimated by MLPs, the matching accuracy of the proposed method was greatly improved compared to conventional DTW and chroma-based DTW matching, which performs matching over the whole reference file. Although the MRR of the proposed method based on PRODUCT rule is the same as that based on the sum rule, the Top 1 accuracy of the proposed method is a little higher than that of the sum rule.
used the method of score level fusion of two classifiers, which are a quantized binary (QB) code-based LS algorithm and a pitch-based DTW algorithm [4]. In addition, they developed an enhanced version of the QbSH system by combining five classifiers, which are pitch-based linear scaling (LS), pitch-based DTW, QB code-based LS, local maximum and minimum point-based LS, and pitch distribution feature-based LS [5]. In the method proposed by Phiwma et al.
For example, the pitch values of 13 and 14 are regarded as being same to those of 1 and 2, respectively, although they can be different from those of 1 and 2. In order to solve this problem, we also used conventional DTW, and combined two scores by the conventional DTW and chroma-based DTW to enhance the matching accuracy.
An MLP is an artificial neural network with the ability to map sets of input data to a set of desired output values, after being trained by a supervised learning technique known as a back-propagation algorithm [18]. In this method, the errors of the feed-forward network are calculated and propagated back from the output nodes to the inputs in order to appropriately adjust the weights in each layer of the network. There are several popularly used kernel functions, and the performance with the hyperbolic tangent function of (3) is compared to that of the log-sigmoid function in this research (see Table 1).
In this study, we proposed a new method of implementing a QbSH system, using MLP, DTW and chroma-based DTW algorithms. The pitch data of humming/singing for matching was obtained by STA-based pitch extractors and normalization methods.
With these four MLPs, we evaluated the overall performance of the QbSH system in the testing phase. In this phase, the performance was measured with total 450 MP3 files including the additional 350 files which were not hummed or sung (not used for training) in order to enhance the confidence level of experiment.

대상 데이터

As shown in Table 1, the model’s performance was compared according to the number of hidden nodes and output nodes of the MLP. Experimental results gave the training performance in the case where four MLPs were used, and where each MLP has one output as shown in Fig. 1. For the training of MLP, the desired output should be obtained, and the desired outputs are the start or end positions in the reference song for DTW and chroma-based DTW as shown in Fig.
This database consists of 1,000 input query files corresponding to 100 reference songs as MP3 files [20]. The 1,000 query data, which have 298 humming and 702 singing data, were obtained from 32 volunteers (18 men and 14 women) [20]. The volunteers were required to select the part of the song that they want to sing or hum, and their query data was recorded in the durations from about 12 seconds.

데이터처리

In the training phase, we made various experiments on different MLP network models according to a number of MLPs, input, hidden and output nodes, and kernel functions using the back-propagation algorithm. The training accuracy of each MLP model was evaluated based on the MSE criterion. Table 1 shows the best training results on various a number of MLPs, input, hidden and output nodes, and kernel functions.

이론/모형

3. The network weights are obtained after the MLP training by using the back-propagation algorithm. With the four trained MLPs, we can obtain the start and end positions in the reference files for DTW and chroma-based DTW matching as shown in Fig.
In this study, we proposed a new method of implementing a QbSH system, using MLP, DTW and chroma-based DTW algorithms. The pitch data of humming/singing for matching was obtained by STA-based pitch extractors and normalization methods. By using the four MLPs, the start and end positions in the MP3 reference song are estimated for DTW and chroma-based DTW matching.
1 gives an overview of the proposed system. The pitch information is extracted from each humming/singing file by a musical note estimation method using spectro-temporal autocorrelation (STA) [4][5][22][23]. After this step, the extracted pitch features are preprocessed as follows [4][5][22][23].
Two calculated distances (scores) by DTW and chroma-based DTW were combined by the score level fusion method. Experimental results showed that the accuracy of score fusion through PRODUCT rule (which calculates final score by multiplying two scores) was the highest as shown in Table 2.

참고문헌 (23)

R. Typke, F. Wiering, and R. C. Veltkamp, "A survey of music information retrieval systems," in Proc. of 6th International Conference on Music Information Retrieval, pp. 153-160, September 11-15, 2005. http://ismir2005.ismir.net/proceedings/1020.pdf
X. Wu, M. Li, J. Liu, J. Yang, and Y. Yan, "A top-down approach to melody match in pitch contour for query by humming," in Proc. of International Symposium on Chinese Spoken Language Processing, vol. 2, pp. 669-680, December 13-16, 2006. http://citeseerx.ist.psu.edu/viewdoc/summary?doi10.1.1.110.1802
K. Kim, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, "Robust query-by-singing/humming system against background noise environments," IEEE Trans. Consumer Electron., vol. 57, no. 2,pp. 720-725, May 2011.

상세보기
G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M.-Y. Kim, "A new query-by-humming system based on the score level fusion of two classifiers," Int. J. Commun. Syst., vol. 25, issue 6, pp. 717-733, June 2012.

상세보기
G. P. Nam, T. T. T. Luong, H. H. Nam, K. R. Park, and S.-J. Park, "Intelligent query by humming system based on score level fusion of multiple classifiers," EURASIP J. Adv. Signal Process., vol. 2011:21, pp. 1-11, July 2011.
A. Kornstadt, "Themefinder: a web-based melodic search tool," Computing in Musicology, MIT Press, 1998, vol. 11, pp. 231-236. http://www.ccarh.org/publications/books/cm/vol/11/contents.html
S. Blackburn and D. DeRoure, "A tool for content based navigation of music," in Proc. of ACM International Conference on Multimedia, pp. 361-368, September 12-16, 1998.
R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. V. Oostrum, "Using transportation distances for measuring melodic similarity," in Proc. of International Conference on Music Information Retrieval, pp. 107-114, October 26-30, 2003. http://ismir2003.ismir.net/papers/Typke.PDF
J.-S. R. Jang and M.-Y. Gao, "A query-by-singing system based on dynamic programming," in Proc. of International Workshop on Intelligent Systems Resolutions, pp. 85-89, December 11-12, 2000. http://ir.lib.nthu.edu.tw/bitstream/987654321/17662/1/2030226030026.pdf
L. Prechelt and R. Typke, "An interface for melody input," ACM Trans. Computer-Human Interact., vol. 8, no. 2, pp. 133-149, June 2001.

상세보기
M. Ryynanen and A. Klapuri, "Query by humming of MIDI and audio using locality sensitive hashing," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2249-2252, March 31-April 4, 2008.
J.-S. R. Jang and H.-R. Lee, "A general framework of progressive filtering and its application to query by singing/humming," IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 2, pp. 350-358, Feb. 2008.

상세보기
S.-P. Heo, M. Suzuki, A. Ito, and S. Makino, "An effective music information retrieval method using three-dimensional continuous DP," IEEE Trans. Multimedia, vol. 8, no. 3, pp. 633- 639, June 2006.

상세보기
N. Phiwma and P. Sanguansat, "A novel method for query-by-humming using distance space," in Proc. of International Conference on Pervasive Computing Signal Processing and Applications, pp. 841-845, September 17-19, 2010.
K. Lemstrom and E. Ukkonen, "Including interval encoding into edit distance based music comparison and retrieval," in Proc. of Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science, pp. 53-60, April 17-20, 2000. http://citeseerx.ist.psu.edu/viewdoc/summary?doi10.1.1.22.6339
M. Mongeau and D. Sankoff, "Comparison of musical sequences," Computers and the Humanities, vol. 24, no. 3, pp. 161-175, June 1990.

상세보기
A. Kotsifakos, P. Papapetrou, J. Hollmen, and D. Gunopulos, "A subsequence matching with gaps-range-tolerances framework: a query-by-humming application," in Proc. of the VLDB Endowment, vol. 4, no. 11, pp. 761-771, 2011. http://www.vldb.org/pvldb/vol4/p761-kotsifakos.pdf
M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network Design, PWS Publishing Company, 1996. http://dl.acm.org/citation.cfm?id249049
M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," IEEE Trans. Multimedia, vol. 7, no. 1, pp. 96-104, Feb. 2005.

상세보기
D. Jang, C.-J. Song, S. Shin, S.-J. Park, S.-J. Jang, and S.-P. Lee, "Implementation of a matching engine for a practical query-by-singing/humming system," in Proc. of IEEE International Symposium on Signal Processing and Information Technology, pp. 258-263, December 14-17, 2011.
M. Muller, Information Retrieval for Music and Motion, Springer, 2007.
G. P. Nam and K. R. Park, "Multi-classifier based query-by-singing/humming system on mobile device," Multimedia Systems, in submission.
G. P. Nam and K. R. Park, "Fast Query-by-Singing/Humming System that Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm," KSII Transactions on Internet and Information Systems, in submission.

저자의 다른 논문 :

LOADING...

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증