소리는 관계와 의사소통을 위한 살아있는 존재의 필수 요소입니다. 이 논문은 인간 건강, 학습, 인간 상호 작용 및 환경에 대한 그들의 중요성에 기초한 분류를 위한 샘플 시나리오로서 세 가지 범주의 소리를 사용했습니다. 이에는 코골이 소리, 음성 감정소리 및 환경 소리가 포함됩니다. 본 연구에서는 멀티-특징 추출 방식을 이용한 소리 분류를 위한 CNN 모델을 제안한다. 추출된 기능은 CNN에 입력으로 사용되는 새 ...
소리는 관계와 의사소통을 위한 살아있는 존재의 필수 요소입니다. 이 논문은 인간 건강, 학습, 인간 상호 작용 및 환경에 대한 그들의 중요성에 기초한 분류를 위한 샘플 시나리오로서 세 가지 범주의 소리를 사용했습니다. 이에는 코골이 소리, 음성 감정소리 및 환경 소리가 포함됩니다. 본 연구에서는 멀티-특징 추출 방식을 이용한 소리 분류를 위한 CNN 모델을 제안한다. 추출된 기능은 CNN에 입력으로 사용되는 새 데이터 세트를 형성하는 데 사용되었습니다. 코골이 및 코골이가 아닌 데이터 세트, 음성 감정을 위한 Berlin EmoDB 및 ESC-50 환경 소리 데이터 세트를 포함한 3가지 데이터 세트에 대해 실험을 수행했습니다. 결과는 코골이 소리 99.7 %, 음성 감정소리 99.1 %, 환경 소리 98.67%의 정확성이 나타났습니다.
소리는 관계와 의사소통을 위한 살아있는 존재의 필수 요소입니다. 이 논문은 인간 건강, 학습, 인간 상호 작용 및 환경에 대한 그들의 중요성에 기초한 분류를 위한 샘플 시나리오로서 세 가지 범주의 소리를 사용했습니다. 이에는 코골이 소리, 음성 감정소리 및 환경 소리가 포함됩니다. 본 연구에서는 멀티-특징 추출 방식을 이용한 소리 분류를 위한 CNN 모델을 제안한다. 추출된 기능은 CNN에 입력으로 사용되는 새 데이터 세트를 형성하는 데 사용되었습니다. 코골이 및 코골이가 아닌 데이터 세트, 음성 감정을 위한 Berlin EmoDB 및 ESC-50 환경 소리 데이터 세트를 포함한 3가지 데이터 세트에 대해 실험을 수행했습니다. 결과는 코골이 소리 99.7 %, 음성 감정소리 99.1 %, 환경 소리 98.67%의 정확성이 나타났습니다.
Sound is can be referred to as a pressure wave caused by object vibration. It is a vital aspect of human beings for creating relationship and communication. Sound has found use in every key areas of life, for example in speech, security, health, music, critical surveillance, language studies, and bu...
Sound is can be referred to as a pressure wave caused by object vibration. It is a vital aspect of human beings for creating relationship and communication. Sound has found use in every key areas of life, for example in speech, security, health, music, critical surveillance, language studies, and building engineering to mention just a few. Sound recognition process involves three phases namely signals pre-processing, features extraction and classification. During signal pre-processing input signal is divided into different segments which are used for features extraction. The extracted features reduces the size of data and forms new feature dataset that serves as input to a classifier method. This thesis used three categories of sounds as samples scenario for classification based on their important to human health, learning, human interaction, and the environment. This includes snoring sound, speech emotion sound, and environmental sound. The snoring sound is a phenomenon in humans associated with Obstructive Sleep Apnea (OSA). Hence, it implies health risk (hypertension and myocardial infarction) to the snorer and also capable of inducing hearing loss to the person sleeping beside the snorer. Polysomnography, a prominent method to diagnose OSA, is very expensive. An alternative approach to this is to develop a machine learning model to classify snoring sound and subsequently use the model as a means of diagnosing snorers. Speech is an important aspect of human interaction in which relationship and emotions can be expressed. Speech emotions include anger, happiness, sadness, fear, disgust and neutral. The ability to learn and recognize human emotions through speech has become an area of interest in the field of human-machine interaction and machine learning. Recently, environmental sound classification has attracted lots of attention in research domain and has found its application in smart internet of things (IoT) devices, for instance, scene analysis and machine hearing. Classification of sounds using deep learning approaches does not yield desirable result to build good models for future use. This is because some salient features needed to sufficiently distinguish these sounds and enhance accurate classification are not well captured during training. Therefore, in this study, A Convolutional Neural Network (CNN) model for sound classification using a multi-feature extraction approach is proposed. The approaches include STFT, RMS, Spectral Centroid, Bandwidth, Rolloff, Zero-Crossing rate, and MFCC. The extracted features were used to form a new dataset that served as input into the CNN. Experiments were carried out on three datasets which include Snoring and non-snoring dataset, Berlin EmoDB for speech emotion, and ESC-50 environmental sound dataset. The results showed accuracies of 99.7% for snoring sound, 99.1% for speech emotion sound, and 98.67% for the environmental sound.
Sound is can be referred to as a pressure wave caused by object vibration. It is a vital aspect of human beings for creating relationship and communication. Sound has found use in every key areas of life, for example in speech, security, health, music, critical surveillance, language studies, and building engineering to mention just a few. Sound recognition process involves three phases namely signals pre-processing, features extraction and classification. During signal pre-processing input signal is divided into different segments which are used for features extraction. The extracted features reduces the size of data and forms new feature dataset that serves as input to a classifier method. This thesis used three categories of sounds as samples scenario for classification based on their important to human health, learning, human interaction, and the environment. This includes snoring sound, speech emotion sound, and environmental sound. The snoring sound is a phenomenon in humans associated with Obstructive Sleep Apnea (OSA). Hence, it implies health risk (hypertension and myocardial infarction) to the snorer and also capable of inducing hearing loss to the person sleeping beside the snorer. Polysomnography, a prominent method to diagnose OSA, is very expensive. An alternative approach to this is to develop a machine learning model to classify snoring sound and subsequently use the model as a means of diagnosing snorers. Speech is an important aspect of human interaction in which relationship and emotions can be expressed. Speech emotions include anger, happiness, sadness, fear, disgust and neutral. The ability to learn and recognize human emotions through speech has become an area of interest in the field of human-machine interaction and machine learning. Recently, environmental sound classification has attracted lots of attention in research domain and has found its application in smart internet of things (IoT) devices, for instance, scene analysis and machine hearing. Classification of sounds using deep learning approaches does not yield desirable result to build good models for future use. This is because some salient features needed to sufficiently distinguish these sounds and enhance accurate classification are not well captured during training. Therefore, in this study, A Convolutional Neural Network (CNN) model for sound classification using a multi-feature extraction approach is proposed. The approaches include STFT, RMS, Spectral Centroid, Bandwidth, Rolloff, Zero-Crossing rate, and MFCC. The extracted features were used to form a new dataset that served as input into the CNN. Experiments were carried out on three datasets which include Snoring and non-snoring dataset, Berlin EmoDB for speech emotion, and ESC-50 environmental sound dataset. The results showed accuracies of 99.7% for snoring sound, 99.1% for speech emotion sound, and 98.67% for the environmental sound.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.