The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about t...
The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about the title or singer of song. Most previous researches on QbSH have been conducted using musical instrument digital interface (MIDI) files as reference songs. However, the production of MIDI files is a time-consuming process. In addition, more and more music files are newly published with the development of music market. Consequently, the method of using the more common MPEG-1 audio layer 3 (MP3) files for reference songs is considered as an alternative. However, there is little previous research on QbSH with MP3 files because an MP3 file has a different waveform due to background music and multiple (polyphonic) melodies compared to the humming/singing query. To overcome these problems, we propose a new QbSH method using MP3 files on mobile device. This research is novel in four ways. First, this is the first research on QbSH using MP3 files as reference songs. Second, the start and end positions on the MP3 file to be matched are estimated by using multi-layered perceptron (MLP) prior to performing the matching with humming/singing query file. Third, for more accurate results, four MLPs are used, which produce the start and end positions for dynamic time warping (DTW) matching algorithm, and those for chroma-based DTW algorithm, respectively. Fourth, two matching scores by the DTW and chroma-based DTW algorithms are combined by using PRODUCT rule, through which a higher matching accuracy is obtained. Experimental results with AFA MP3 database show that the accuracy (Top 1 accuracy of 98%, with an MRR of 0.989) of the proposed method is much higher than that of other methods. We also showed the effectiveness of the proposed system on consumer mobile device.
The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about the title or singer of song. Most previous researches on QbSH have been conducted using musical instrument digital interface (MIDI) files as reference songs. However, the production of MIDI files is a time-consuming process. In addition, more and more music files are newly published with the development of music market. Consequently, the method of using the more common MPEG-1 audio layer 3 (MP3) files for reference songs is considered as an alternative. However, there is little previous research on QbSH with MP3 files because an MP3 file has a different waveform due to background music and multiple (polyphonic) melodies compared to the humming/singing query. To overcome these problems, we propose a new QbSH method using MP3 files on mobile device. This research is novel in four ways. First, this is the first research on QbSH using MP3 files as reference songs. Second, the start and end positions on the MP3 file to be matched are estimated by using multi-layered perceptron (MLP) prior to performing the matching with humming/singing query file. Third, for more accurate results, four MLPs are used, which produce the start and end positions for dynamic time warping (DTW) matching algorithm, and those for chroma-based DTW algorithm, respectively. Fourth, two matching scores by the DTW and chroma-based DTW algorithms are combined by using PRODUCT rule, through which a higher matching accuracy is obtained. Experimental results with AFA MP3 database show that the accuracy (Top 1 accuracy of 98%, with an MRR of 0.989) of the proposed method is much higher than that of other methods. We also showed the effectiveness of the proposed system on consumer mobile device.
* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.
제안 방법
After this normalization step, each processed feature is applied to four separated neural networks, which are trained to estimate the four positions (in the MP3 file), namely the start and end positions for matching by the DTW and chroma-based DTW algorithms. Based on these four positions, the two DTW algorithms calculate the distances between the pitch data of the humming/singing file and those of the reference files.
989 and Top 1 accuracy of 98% compared to other methods. By matching only at the start and end positions estimated by MLPs, the matching accuracy of the proposed method was greatly improved compared to conventional DTW and chroma-based DTW matching, which performs matching over the whole reference file. Although the MRR of the proposed method based on PRODUCT rule is the same as that based on the sum rule, the Top 1 accuracy of the proposed method is a little higher than that of the sum rule.
used the method of score level fusion of two classifiers, which are a quantized binary (QB) code-based LS algorithm and a pitch-based DTW algorithm [4]. In addition, they developed an enhanced version of the QbSH system by combining five classifiers, which are pitch-based linear scaling (LS), pitch-based DTW, QB code-based LS, local maximum and minimum point-based LS, and pitch distribution feature-based LS [5]. In the method proposed by Phiwma et al.
For example, the pitch values of 13 and 14 are regarded as being same to those of 1 and 2, respectively, although they can be different from those of 1 and 2. In order to solve this problem, we also used conventional DTW, and combined two scores by the conventional DTW and chroma-based DTW to enhance the matching accuracy.
An MLP is an artificial neural network with the ability to map sets of input data to a set of desired output values, after being trained by a supervised learning technique known as a back-propagation algorithm [18]. In this method, the errors of the feed-forward network are calculated and propagated back from the output nodes to the inputs in order to appropriately adjust the weights in each layer of the network. There are several popularly used kernel functions, and the performance with the hyperbolic tangent function of (3) is compared to that of the log-sigmoid function in this research (see Table 1).
In this study, we proposed a new method of implementing a QbSH system, using MLP, DTW and chroma-based DTW algorithms. The pitch data of humming/singing for matching was obtained by STA-based pitch extractors and normalization methods.
With these four MLPs, we evaluated the overall performance of the QbSH system in the testing phase. In this phase, the performance was measured with total 450 MP3 files including the additional 350 files which were not hummed or sung (not used for training) in order to enhance the confidence level of experiment.
대상 데이터
As shown in Table 1, the model’s performance was compared according to the number of hidden nodes and output nodes of the MLP. Experimental results gave the training performance in the case where four MLPs were used, and where each MLP has one output as shown in Fig. 1. For the training of MLP, the desired output should be obtained, and the desired outputs are the start or end positions in the reference song for DTW and chroma-based DTW as shown in Fig.
This database consists of 1,000 input query files corresponding to 100 reference songs as MP3 files [20]. The 1,000 query data, which have 298 humming and 702 singing data, were obtained from 32 volunteers (18 men and 14 women) [20]. The volunteers were required to select the part of the song that they want to sing or hum, and their query data was recorded in the durations from about 12 seconds.
데이터처리
In the training phase, we made various experiments on different MLP network models according to a number of MLPs, input, hidden and output nodes, and kernel functions using the back-propagation algorithm. The training accuracy of each MLP model was evaluated based on the MSE criterion. Table 1 shows the best training results on various a number of MLPs, input, hidden and output nodes, and kernel functions.
이론/모형
3. The network weights are obtained after the MLP training by using the back-propagation algorithm. With the four trained MLPs, we can obtain the start and end positions in the reference files for DTW and chroma-based DTW matching as shown in Fig.
In this study, we proposed a new method of implementing a QbSH system, using MLP, DTW and chroma-based DTW algorithms. The pitch data of humming/singing for matching was obtained by STA-based pitch extractors and normalization methods. By using the four MLPs, the start and end positions in the MP3 reference song are estimated for DTW and chroma-based DTW matching.
1 gives an overview of the proposed system. The pitch information is extracted from each humming/singing file by a musical note estimation method using spectro-temporal autocorrelation (STA) [4][5][22][23]. After this step, the extracted pitch features are preprocessed as follows [4][5][22][23].
Two calculated distances (scores) by DTW and chroma-based DTW were combined by the score level fusion method. Experimental results showed that the accuracy of score fusion through PRODUCT rule (which calculates final score by multiplying two scores) was the highest as shown in Table 2.
성능/효과
By using the four MLPs, the start and end positions in the MP3 reference song are estimated for DTW and chroma-based DTW matching. Experimental results showed that the proposed method worked effectively on the AFA MP3 database with high accuracy. In future works, we plan to research the methods of enhancing the accuracy by combining MLP and hidden Markov models (HMMs) with various QbSH databases.
후속연구
Experimental results showed that the proposed method worked effectively on the AFA MP3 database with high accuracy. In future works, we plan to research the methods of enhancing the accuracy by combining MLP and hidden Markov models (HMMs) with various QbSH databases.
참고문헌 (23)
R. Typke, F. Wiering, and R. C. Veltkamp, "A survey of music information retrieval systems," in Proc. of 6th International Conference on Music Information Retrieval, pp. 153-160, September 11-15, 2005. http://ismir2005.ismir.net/proceedings/1020.pdf
X. Wu, M. Li, J. Liu, J. Yang, and Y. Yan, "A top-down approach to melody match in pitch contour for query by humming," in Proc. of International Symposium on Chinese Spoken Language Processing, vol. 2, pp. 669-680, December 13-16, 2006. http://citeseerx.ist.psu.edu/viewdoc/summary?doi10.1.1.110.1802
K. Kim, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, "Robust query-by-singing/humming system against background noise environments," IEEE Trans. Consumer Electron., vol. 57, no. 2,pp. 720-725, May 2011.
G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M.-Y. Kim, "A new query-by-humming system based on the score level fusion of two classifiers," Int. J. Commun. Syst., vol. 25, issue 6, pp. 717-733, June 2012.
G. P. Nam, T. T. T. Luong, H. H. Nam, K. R. Park, and S.-J. Park, "Intelligent query by humming system based on score level fusion of multiple classifiers," EURASIP J. Adv. Signal Process., vol. 2011:21, pp. 1-11, July 2011.
A. Kornstadt, "Themefinder: a web-based melodic search tool," Computing in Musicology, MIT Press, 1998, vol. 11, pp. 231-236. http://www.ccarh.org/publications/books/cm/vol/11/contents.html
S. Blackburn and D. DeRoure, "A tool for content based navigation of music," in Proc. of ACM International Conference on Multimedia, pp. 361-368, September 12-16, 1998.
R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. V. Oostrum, "Using transportation distances for measuring melodic similarity," in Proc. of International Conference on Music Information Retrieval, pp. 107-114, October 26-30, 2003. http://ismir2003.ismir.net/papers/Typke.PDF
J.-S. R. Jang and M.-Y. Gao, "A query-by-singing system based on dynamic programming," in Proc. of International Workshop on Intelligent Systems Resolutions, pp. 85-89, December 11-12, 2000. http://ir.lib.nthu.edu.tw/bitstream/987654321/17662/1/2030226030026.pdf
L. Prechelt and R. Typke, "An interface for melody input," ACM Trans. Computer-Human Interact., vol. 8, no. 2, pp. 133-149, June 2001.
M. Ryynanen and A. Klapuri, "Query by humming of MIDI and audio using locality sensitive hashing," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2249-2252, March 31-April 4, 2008.
J.-S. R. Jang and H.-R. Lee, "A general framework of progressive filtering and its application to query by singing/humming," IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 2, pp. 350-358, Feb. 2008.
S.-P. Heo, M. Suzuki, A. Ito, and S. Makino, "An effective music information retrieval method using three-dimensional continuous DP," IEEE Trans. Multimedia, vol. 8, no. 3, pp. 633- 639, June 2006.
N. Phiwma and P. Sanguansat, "A novel method for query-by-humming using distance space," in Proc. of International Conference on Pervasive Computing Signal Processing and Applications, pp. 841-845, September 17-19, 2010.
K. Lemstrom and E. Ukkonen, "Including interval encoding into edit distance based music comparison and retrieval," in Proc. of Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science, pp. 53-60, April 17-20, 2000. http://citeseerx.ist.psu.edu/viewdoc/summary?doi10.1.1.22.6339
M. Mongeau and D. Sankoff, "Comparison of musical sequences," Computers and the Humanities, vol. 24, no. 3, pp. 161-175, June 1990.
A. Kotsifakos, P. Papapetrou, J. Hollmen, and D. Gunopulos, "A subsequence matching with gaps-range-tolerances framework: a query-by-humming application," in Proc. of the VLDB Endowment, vol. 4, no. 11, pp. 761-771, 2011. http://www.vldb.org/pvldb/vol4/p761-kotsifakos.pdf
M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network Design, PWS Publishing Company, 1996. http://dl.acm.org/citation.cfm?id249049
M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," IEEE Trans. Multimedia, vol. 7, no. 1, pp. 96-104, Feb. 2005.
D. Jang, C.-J. Song, S. Shin, S.-J. Park, S.-J. Jang, and S.-P. Lee, "Implementation of a matching engine for a practical query-by-singing/humming system," in Proc. of IEEE International Symposium on Signal Processing and Information Technology, pp. 258-263, December 14-17, 2011.
M. Muller, Information Retrieval for Music and Motion, Springer, 2007.
G. P. Nam and K. R. Park, "Multi-classifier based query-by-singing/humming system on mobile device," Multimedia Systems, in submission.
G. P. Nam and K. R. Park, "Fast Query-by-Singing/Humming System that Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm," KSII Transactions on Internet and Information Systems, in submission.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.