[논문]옥타브밴드 순서 통계량에 기반한 음악 장르 분류

서진수

doi:10.7776/ask.2014.33.1.081

초록
AI-Helper

본 논문은 음악신호의 옥타브 밴드 상에서 주파수와 시간 방향의 순서 통계량에 기반한 음악분류기에 대한 연구이다. 음악의 화음 및 강약 구조를 표현하기 위해서 파워스펙트럼의 옥타브 밴드 순서 통계량을 이용하였다. 널리 사용되고 있는 두 음악 데이터셋을 이용한 성능 실험을 통해서, 옥타브 밴드 순서 통계량이 기존의 MFCC 와 옥타브밴드 스펙트럼 고저차 특징에 비해서 두 데이터셋에대해 각각 2.61 %와 8.9 % 장르 분류정확도가 개선되었다. 실험결과는 옥타브 밴드 순서 통계량이 음악 장르 분류에 적합함을 보인다.

Abstract ▼ AI-Helper

This paper presents a study on the effectiveness of using the spectral and the temporal octave-band order statistics for musical genre classification. In order to represent the relative disposition of the harmonic and non-harmonic components, we utilize the octave-band order statistics of power spec...

This paper presents a study on the effectiveness of using the spectral and the temporal octave-band order statistics for musical genre classification. In order to represent the relative disposition of the harmonic and non-harmonic components, we utilize the octave-band order statistics of power spectral distribution. Experiments on the widely used two music datasets were performed; the results show that the octave-band order statistics improve genre classification accuracy by 2.61 % for one dataset and 8.9 % for another dataset compared with the mel-frequency cepstral coefficients and the octave-band spectral contrast. Experimental results show that the octave-band order statistics are promising for musical genre classification.

Keyword

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

The OSC was first proposed solely for musical genre classification and describes the difference between the maximum and the minimum of the power spectrum at the octave-scale subbands. As an extension to OSC, this paper investigates different types of the spectral distributional characteristics of the octave-scale subbands. In particular, we propose a musical genre classification method based on the octaveband order statistics, such as the median, quartile, minimum, maximum, and so on.
The short-time spectral features in the proposed method are based on the distributional characteristics of the octavescale subbands. According to the results in the paper,^[6] the octave-scale subbands contain enough information for distinguishing the genres of a music signal.
The mean and the standard deviation of the frame-level features in a segment are widely-used as a segment-level feature for most of the previous works. In this paper, we also extend the previous temporal integration methods, the mean and the standard deviation, into the order statistics. We apply the same types of the summary order statistics in Table 1 to temporal integration of framelevel features.
The extracted short-time features were temporally integrated over six seconds. Then the linear SVM classifier was trained and tested in classifying a segment-level feature. The genre of each music clip was determined by the majority voting on the classification results of the segments in the clip.
The classification results of the ISMIR2004 and GTZAN datasets are given in Table 2 and 3 respectively. The set of octave-band order statistics, SOS₁, SOS₂, SOS₃, SOS₄, and SOS₅, in Table 1 was used in combination with the temporal integration methods, TOS₁, TOS₂, TOS₃, TOS₄, and TOS₅. For a comparison with the previous spectral descriptors, the 12-order MFCC and the OSC were included in the test.
We note that it is reported in^[2,9] if they use the SVM with RBF kernel, the classification accuracy can be improved further about 5 %. Since an extensive benchmark testing is not the aim of this paper, we focus on showing the validity of using the octave-band order statistics on the musical genre classification task. By using the SOS₅ and TOS₅, the classification accuracy of the proposed method exceeded 84 % on both datasets in Tables 2 and 3, which is among the best results reported so far with the linear SVM classifier, although the proposed method is less complicated than the other approaches compared with.

대상 데이터

The second music dataset (abbreviated as GTZAN) is the one that was used by George Tzanetakis in his work.^[1] It consists of 1000 songs over ten different genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. For the ISMIR2004 dataset, one half of the songs was used for training, and the other half was used for testing.
^[1] It consists of 1000 songs over ten different genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, and rock. For the ISMIR2004 dataset, one half of the songs was used for training, and the other half was used for testing. For the GTZAN dataset, the 10 fold cross-validation was used to get the classification accuracy.

이론/모형

Any kind of state-of-the-art statistical classifiers, such as nearest neighbor, Gaussian mixture model, and support vector machine (SVM), can be used in training the genre model over the segment-level feature. In this paper, the linear SVM classifier, known for its simplicity and reasonably high classification accuracy, is used. As a final step, the classification results from all segments in a music clip are aggregated typically by the majority voting rule.
For a comparison with the previous spectral descriptors, the 12-order MFCC and the OSC were included in the test. To compare temporal integration methods, the temporal mean and standard deviation (abbreviated as TMS) was included in the test. The results of them are also listed in Tables 2 and 3.

성능/효과

The maximum, minimum, median, and two quartiles is denoted by SOS₅. Using more percentiles could make the summary statistics more descriptive, however in practice adding more percentiles than SOS₅ was not highly effective (only improving classification performance marginally). The most important five-number ordinal summary statistics in SOS₅ was enough to provide the state-of-the-art musical genre classification accuracy.
The results of them are also listed in Tables 2 and 3. The best classification accuracy for the ISMIR2004 dataset was 84.5 % which was achieved by the combination of both SOS3 and TOS5 and SOS5 and TOS3. The best classification accuracy for the GTZAN dataset was 87.
5 % which was achieved by the combination of both SOS3 and TOS5 and SOS5 and TOS3. The best classification accuracy for the GTZAN dataset was 87.1 % which was achieved by the combination of SOS5 and TOS5. As a practical point of view, the SOS₃ and TOS₃ were quite effective with a moderate feature dimensionality.
The performance of the proposed method was experimentally compared with that of the MFCC and OSC on both the ISMIR2004 and the GTZAN datasets. The performance gain obtained by using the octave-band order statistics over the MFCC and OSC was 2.61 % for ISMIR2004 and 8.9 % for GTZAN.

참고문헌 (9)

G. Tzanetakis and P. Cook, "Musical genre classification of audio signals," IEEE Trans. Speech and Audio Process. 10, 293-302 (2002).

상세보기
Y. Panagakis, C. Kotropoulos, and G. Arce, "Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification," IEEE Trans. Audio Speech Lang. Process. 18, 576-588 (2010).

상세보기
S.-C. Lim, S.-J. Jang, S.-P. Lee, and M. Y. Kim, "Music genre classification system using decorrelated filter bank," (in Korean) J. Acoust. Soc. Kr. 30, 100-106 (2011).

원문보기 상세보기
A. Meng, P. Ahrendt, J. Larsen, and L. Hansen, "Temporal feature integration for music genre classification," IEEE Trans. Audio Speech Lang. Process. 15, 1654 - 1664 (2007).

상세보기
E. Pampalk, A. Flexer, and G. Widmer, "Improvements of audio-based music similarity and genre classification," in Proc. ISMIR-2005, 634-637 (2005).
D. Jiang, L. Lu, H. Zhang, J. Tao, and L. Cai, "Music type classification by spectral contrast feature," in Proc. ICME-2002, 113-116 (2002).
P. Loizou and O. Poroy, "Minimum spectral contrast needed for vowel identification by normal-hearing and cochlear implant listeners," J. Acoust. Soc. Am. 110, 1619-1627 (2001).

상세보기
J. Seo and S. Lee, "Higher-order moments for musical genre classification," Signal Processing 91, 2154-2157 (2011).

상세보기
S.-C. Lim, J.-S. Lee, S.-J. Jang, S.-P. Lee, and M. Kim, "Music-genre classification system based on spectro-temporal features and feature selection," IEEE Trans. Consum. Electron. 58, 1262-1268 (2012).

상세보기

이 논문을 인용한 문헌

저자의 다른 논문 :

LOADING...

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[국내논문] 옥타브밴드 순서 통계량에 기반한 음악 장르 분류
A Musical Genre Classification Method Based on the Octave-Band Order Statistics 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

Keyword

AI 본문요약
AI-Helper

제안 방법

대상 데이터

이론/모형

성능/효과

참고문헌 (9)

이 논문을 인용한 문헌

저자의 다른 논문 :

연구과제 타임라인

활용도 분석정보

활용도 Top5 논문

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[국내논문] 옥타브밴드 순서 통계량에 기반한 음악 장르 분류 A Musical Genre Classification Method Based on the Octave-Band Order Statistics 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

Keyword

AI 본문요약 엑셀 다운로드 AI-Helper

제안 방법

대상 데이터

이론/모형

성능/효과

참고문헌 (9)

이 논문을 인용한 문헌

저자의 다른 논문 :

서진수 (18)

연구과제 타임라인

전체(0) 논문(0) 특허(0) 보고서(0)

전체(0) 논문(0) 특허(0) 보고서(0)

활용도 분석정보

활용도 Top5 논문 더보기

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

[국내논문] 옥타브밴드 순서 통계량에 기반한 음악 장르 분류
A Musical Genre Classification Method Based on the Octave-Band Order Statistics 원문보기

초록
AI-Helper

AI 본문요약
AI-Helper

활용도 Top5 논문