[논문]A Convolutional Neural Network Model for Sound Classification Based on Multi-Feature Extraction

Adesuyi Tosin Akinwale

A Convolutional Neural Network Model for Sound Classification Based on Multi-Feature Extraction 원문보기

Adesuyi Tosin Akinwale (금오공과대학교 대학원 소프트웨어공학과 인공지능 국내박사)

초록 ▼
AI-Helper

소리는 관계와 의사소통을 위한 살아있는 존재의 필수 요소입니다. 이 논문은 인간 건강, 학습, 인간 상호 작용 및 환경에 대한 그들의 중요성에 기초한 분류를 위한 샘플 시나리오로서 세 가지 범주의 소리를 사용했습니다. 이에는 코골이 소리, 음성 감정소리 및 환경 소리가 포함됩니다. 본 연구에서는 멀티-특징 추출 방식을 이용한 소리 분류를 위한 CNN 모델을 제안한다. 추출된 기능은 CNN에 입력으로 사용되는 새 데이터 세트를 형성하는 데 사용되었습니다. 코골이 및 코골이가 아닌 데이터 세트, 음성 감정을 위한 Berlin EmoDB 및 ESC-50 환경 소리 데이터 세트를 포함한 3가지 데이터 세트에 대해 실험을 수행했습니다. 결과는 코골이 소리 99.7 %, 음성 감정소리 99.1 %, 환경 소리 98.67%의 정확성이 나타났습니다.

Abstract ▼ AI-Helper

Sound is can be referred to as a pressure wave caused by object vibration. It is a vital aspect of human beings for creating relationship and communication. Sound has found use in every key areas of life, for example in speech, security, health, music, critical surveillance, language studies, and building engineering to mention just a few. Sound recognition process involves three phases namely signals pre-processing, features extraction and classification. During signal pre-processing input signal is divided into different segments which are used for features extraction. The extracted features reduces the size of data and forms new feature dataset that serves as input to a classifier method. This thesis used three categories of sounds as samples scenario for classification based on their important to human health, learning, human interaction, and the environment. This includes snoring sound, speech emotion sound, and environmental sound.
The snoring sound is a phenomenon in humans associated with Obstructive Sleep Apnea (OSA). Hence, it implies health risk (hypertension and myocardial infarction) to the snorer and also capable of inducing hearing loss to the person sleeping beside the snorer. Polysomnography, a prominent method to diagnose OSA, is very expensive. An alternative approach to this is to develop a machine learning model to classify snoring sound and subsequently use the model as a means of diagnosing snorers. Speech is an important aspect of human interaction in which relationship and emotions can be expressed. Speech emotions include anger, happiness, sadness, fear, disgust and neutral. The ability to learn and recognize human emotions through speech has become an area of interest in the field of human-machine interaction and machine learning. Recently, environmental sound classification has attracted lots of attention in research domain and has found its application in smart internet of things (IoT) devices, for instance, scene analysis and machine hearing.
Classification of sounds using deep learning approaches does not yield desirable result to build good models for future use. This is because some salient features needed to sufficiently distinguish these sounds and enhance accurate classification are not well captured during training. Therefore, in this study, A Convolutional Neural Network (CNN) model for sound classification using a multi-feature extraction approach is proposed. The approaches include STFT, RMS, Spectral Centroid, Bandwidth, Rolloff, Zero-Crossing rate, and MFCC. The extracted features were used to form a new dataset that served as input into the CNN. Experiments were carried out on three datasets which include Snoring and non-snoring dataset, Berlin EmoDB for speech emotion, and ESC-50 environmental sound dataset. The results showed accuracies of 99.7% for snoring sound, 99.1% for speech emotion sound, and 98.67% for the environmental sound.

학위논문 정보

저자	Adesuyi Tosin Akinwale
학위수여기관	금오공과대학교 대학원
학위구분	국내박사
학과	소프트웨어공학과 인공지능
지도교수	Prof. Byeong Man Kim
발행연도	2020
총페이지	120
언어	eng
원문 URL	http://www.riss.kr/link?id=T15681367&outLink=K
정보원	한국교육학술정보원

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

A Convolutional Neural Network Model for Sound Classification Based on Multi-Feature Extraction 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

A Convolutional Neural Network Model for Sound Classification Based on Multi-Feature Extraction 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

초록 ▼
AI-Helper