$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

A Novel Query-by-Singing/Humming Method by Estimating Matching Positions Based on Multi-layered Perceptron 원문보기

KSII Transactions on internet and information systems : TIIS, v.7 no.7, 2013년, pp.1657 - 1670  

Pham, Tuyen Danh (Division of Electronics and Electrical Engineering, Dongguk University) ,  Nam, Gi Pyo (Division of Electronics and Electrical Engineering, Dongguk University) ,  Shin, Kwang Yong (Division of Electronics and Electrical Engineering, Dongguk University) ,  Park, Kang Ryoung (Division of Electronics and Electrical Engineering, Dongguk University)

Abstract AI-Helper 아이콘AI-Helper

The increase in the number of music files in smart phone and MP3 player makes it difficult to find the music files which people want. So, Query-by-Singing/Humming (QbSH) systems have been developed to retrieve music from a user's humming or singing without having to know detailed information about t...

주제어

AI 본문요약
AI-Helper 아이콘 AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

  • After this normalization step, each processed feature is applied to four separated neural networks, which are trained to estimate the four positions (in the MP3 file), namely the start and end positions for matching by the DTW and chroma-based DTW algorithms. Based on these four positions, the two DTW algorithms calculate the distances between the pitch data of the humming/singing file and those of the reference files.
  • 989 and Top 1 accuracy of 98% compared to other methods. By matching only at the start and end positions estimated by MLPs, the matching accuracy of the proposed method was greatly improved compared to conventional DTW and chroma-based DTW matching, which performs matching over the whole reference file. Although the MRR of the proposed method based on PRODUCT rule is the same as that based on the sum rule, the Top 1 accuracy of the proposed method is a little higher than that of the sum rule.
  • used the method of score level fusion of two classifiers, which are a quantized binary (QB) code-based LS algorithm and a pitch-based DTW algorithm [4]. In addition, they developed an enhanced version of the QbSH system by combining five classifiers, which are pitch-based linear scaling (LS), pitch-based DTW, QB code-based LS, local maximum and minimum point-based LS, and pitch distribution feature-based LS [5]. In the method proposed by Phiwma et al.
  • For example, the pitch values of 13 and 14 are regarded as being same to those of 1 and 2, respectively, although they can be different from those of 1 and 2. In order to solve this problem, we also used conventional DTW, and combined two scores by the conventional DTW and chroma-based DTW to enhance the matching accuracy.
  • An MLP is an artificial neural network with the ability to map sets of input data to a set of desired output values, after being trained by a supervised learning technique known as a back-propagation algorithm [18]. In this method, the errors of the feed-forward network are calculated and propagated back from the output nodes to the inputs in order to appropriately adjust the weights in each layer of the network. There are several popularly used kernel functions, and the performance with the hyperbolic tangent function of (3) is compared to that of the log-sigmoid function in this research (see Table 1).
  • In this study, we proposed a new method of implementing a QbSH system, using MLP, DTW and chroma-based DTW algorithms. The pitch data of humming/singing for matching was obtained by STA-based pitch extractors and normalization methods.
  • With these four MLPs, we evaluated the overall performance of the QbSH system in the testing phase. In this phase, the performance was measured with total 450 MP3 files including the additional 350 files which were not hummed or sung (not used for training) in order to enhance the confidence level of experiment.

대상 데이터

  • As shown in Table 1, the model’s performance was compared according to the number of hidden nodes and output nodes of the MLP. Experimental results gave the training performance in the case where four MLPs were used, and where each MLP has one output as shown in Fig. 1. For the training of MLP, the desired output should be obtained, and the desired outputs are the start or end positions in the reference song for DTW and chroma-based DTW as shown in Fig.
  • This database consists of 1,000 input query files corresponding to 100 reference songs as MP3 files [20]. The 1,000 query data, which have 298 humming and 702 singing data, were obtained from 32 volunteers (18 men and 14 women) [20]. The volunteers were required to select the part of the song that they want to sing or hum, and their query data was recorded in the durations from about 12 seconds.

데이터처리

  • In the training phase, we made various experiments on different MLP network models according to a number of MLPs, input, hidden and output nodes, and kernel functions using the back-propagation algorithm. The training accuracy of each MLP model was evaluated based on the MSE criterion. Table 1 shows the best training results on various a number of MLPs, input, hidden and output nodes, and kernel functions.

이론/모형

  • 3. The network weights are obtained after the MLP training by using the back-propagation algorithm. With the four trained MLPs, we can obtain the start and end positions in the reference files for DTW and chroma-based DTW matching as shown in Fig.
  • In this study, we proposed a new method of implementing a QbSH system, using MLP, DTW and chroma-based DTW algorithms. The pitch data of humming/singing for matching was obtained by STA-based pitch extractors and normalization methods. By using the four MLPs, the start and end positions in the MP3 reference song are estimated for DTW and chroma-based DTW matching.
  • 1 gives an overview of the proposed system. The pitch information is extracted from each humming/singing file by a musical note estimation method using spectro-temporal autocorrelation (STA) [4][5][22][23]. After this step, the extracted pitch features are preprocessed as follows [4][5][22][23].
  • Two calculated distances (scores) by DTW and chroma-based DTW were combined by the score level fusion method. Experimental results showed that the accuracy of score fusion through PRODUCT rule (which calculates final score by multiplying two scores) was the highest as shown in Table 2.
본문요약 정보가 도움이 되었나요?

참고문헌 (23)

  1. R. Typke, F. Wiering, and R. C. Veltkamp, "A survey of music information retrieval systems," in Proc. of 6th International Conference on Music Information Retrieval, pp. 153-160, September 11-15, 2005. http://ismir2005.ismir.net/proceedings/1020.pdf 

  2. X. Wu, M. Li, J. Liu, J. Yang, and Y. Yan, "A top-down approach to melody match in pitch contour for query by humming," in Proc. of International Symposium on Chinese Spoken Language Processing, vol. 2, pp. 669-680, December 13-16, 2006. http://citeseerx.ist.psu.edu/viewdoc/summary?doi10.1.1.110.1802 

  3. K. Kim, K. R. Park, S.-J. Park, S.-P. Lee, and M. Y. Kim, "Robust query-by-singing/humming system against background noise environments," IEEE Trans. Consumer Electron., vol. 57, no. 2,pp. 720-725, May 2011. 

  4. G. P. Nam, K. R. Park, S.-J. Park, S.-P. Lee, and M.-Y. Kim, "A new query-by-humming system based on the score level fusion of two classifiers," Int. J. Commun. Syst., vol. 25, issue 6, pp. 717-733, June 2012. 

  5. G. P. Nam, T. T. T. Luong, H. H. Nam, K. R. Park, and S.-J. Park, "Intelligent query by humming system based on score level fusion of multiple classifiers," EURASIP J. Adv. Signal Process., vol. 2011:21, pp. 1-11, July 2011. 

  6. A. Kornstadt, "Themefinder: a web-based melodic search tool," Computing in Musicology, MIT Press, 1998, vol. 11, pp. 231-236. http://www.ccarh.org/publications/books/cm/vol/11/contents.html 

  7. S. Blackburn and D. DeRoure, "A tool for content based navigation of music," in Proc. of ACM International Conference on Multimedia, pp. 361-368, September 12-16, 1998. 

  8. R. Typke, P. Giannopoulos, R. C. Veltkamp, F. Wiering, and R. V. Oostrum, "Using transportation distances for measuring melodic similarity," in Proc. of International Conference on Music Information Retrieval, pp. 107-114, October 26-30, 2003. http://ismir2003.ismir.net/papers/Typke.PDF 

  9. J.-S. R. Jang and M.-Y. Gao, "A query-by-singing system based on dynamic programming," in Proc. of International Workshop on Intelligent Systems Resolutions, pp. 85-89, December 11-12, 2000. http://ir.lib.nthu.edu.tw/bitstream/987654321/17662/1/2030226030026.pdf 

  10. L. Prechelt and R. Typke, "An interface for melody input," ACM Trans. Computer-Human Interact., vol. 8, no. 2, pp. 133-149, June 2001. 

  11. M. Ryynanen and A. Klapuri, "Query by humming of MIDI and audio using locality sensitive hashing," in Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2249-2252, March 31-April 4, 2008. 

  12. J.-S. R. Jang and H.-R. Lee, "A general framework of progressive filtering and its application to query by singing/humming," IEEE Trans. Audio Speech Lang. Process., vol. 16, no. 2, pp. 350-358, Feb. 2008. 

  13. S.-P. Heo, M. Suzuki, A. Ito, and S. Makino, "An effective music information retrieval method using three-dimensional continuous DP," IEEE Trans. Multimedia, vol. 8, no. 3, pp. 633- 639, June 2006. 

  14. N. Phiwma and P. Sanguansat, "A novel method for query-by-humming using distance space," in Proc. of International Conference on Pervasive Computing Signal Processing and Applications, pp. 841-845, September 17-19, 2010. 

  15. K. Lemstrom and E. Ukkonen, "Including interval encoding into edit distance based music comparison and retrieval," in Proc. of Symposium on Creative and Cultural Aspects and Applications of AI and Cognitive Science, pp. 53-60, April 17-20, 2000. http://citeseerx.ist.psu.edu/viewdoc/summary?doi10.1.1.22.6339 

  16. M. Mongeau and D. Sankoff, "Comparison of musical sequences," Computers and the Humanities, vol. 24, no. 3, pp. 161-175, June 1990. 

  17. A. Kotsifakos, P. Papapetrou, J. Hollmen, and D. Gunopulos, "A subsequence matching with gaps-range-tolerances framework: a query-by-humming application," in Proc. of the VLDB Endowment, vol. 4, no. 11, pp. 761-771, 2011. http://www.vldb.org/pvldb/vol4/p761-kotsifakos.pdf 

  18. M. T. Hagan, H. B. Demuth, and M. H. Beale, Neural Network Design, PWS Publishing Company, 1996. http://dl.acm.org/citation.cfm?id249049 

  19. M. A. Bartsch and G. H. Wakefield, "Audio thumbnailing of popular music using chroma-based representations," IEEE Trans. Multimedia, vol. 7, no. 1, pp. 96-104, Feb. 2005. 

  20. D. Jang, C.-J. Song, S. Shin, S.-J. Park, S.-J. Jang, and S.-P. Lee, "Implementation of a matching engine for a practical query-by-singing/humming system," in Proc. of IEEE International Symposium on Signal Processing and Information Technology, pp. 258-263, December 14-17, 2011. 

  21. M. Muller, Information Retrieval for Music and Motion, Springer, 2007. 

  22. G. P. Nam and K. R. Park, "Multi-classifier based query-by-singing/humming system on mobile device," Multimedia Systems, in submission. 

  23. G. P. Nam and K. R. Park, "Fast Query-by-Singing/Humming System that Combines Linear Scaling and Quantized Dynamic Time Warping Algorithm," KSII Transactions on Internet and Information Systems, in submission. 

저자의 다른 논문 :

LOADING...
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로