[논문]MCycleGAN: 잡음이 포함된 음성에서 아이 음성 추출을 위한 CycleGAN 기반의 딥러닝 모형

손수락; 심규정; 정이나

doi:10.7472/jksii.2023.24.3.1

초록
AI-Helper

아기를 돌보는 로봇에게 가장 중요한 기술은 아기의 상태를 파악하는 것이다. 아기는 주로 울음소리의 패턴으로 자신의 상태를 표현하기 때문에, 음성을 통해 아기의 상태를 분류하는 연구가 활발히 이루어지고 있다. 대부분 아기의 상태를 분류하는 연구들은 잡음이 포함되지 않은 깔끔한 아기 음성으로 아기의 상태를 파악했다. 하지만 실제 환경에서 수집된 아기의 음성 데이터는 내부에 잡음이 포함되어 있을 가능성이 크다. 따라서, 음성 데이터 속의 잡음을 처리해야 한다. 본 논문은 잡음 처리를 위해, Cycle GAN 기반 딥러닝 모델인 MCycle GAN(Multiple Cycle Generative Adversarial Net)을 제안한다. MCycle GAN은 더욱 정밀한 잡음 처리를 위해, 기존 Cycle GAN에서 Cycle을 다중으로 배치한 모델이다. 다수의 생성자와 소수의 판별자가 적대 관계로 학습함으로 판별자의 판별성능을 향상하고, 생성자는 판별자를 속이기 위해 더 정밀한 위조 데이터를 생성해야 한다. 실험 결과, MCycle GAN 모델은 Cycle에 비해 더 많은 학습 시간이 소요되지만, 더 강화된 판별자의 판별 성능과 생성자의 위조 데이터 생성 성능을 보였다. 그러나 너무 많은 Cycle을 가질 경우, 늘어난 학습 시간에 비해 적은 성능 향상을 볼 수 있었다.

Abstract ▼ AI-Helper

The most important technology for a robot to take care of a baby is to understand the baby's condition. Since babies mainly express their status through the pattern of crying sounds, research to classify the baby's status through voice is being actively conducted. Most of the studies that classify t...

The most important technology for a robot to take care of a baby is to understand the baby's condition. Since babies mainly express their status through the pattern of crying sounds, research to classify the baby's status through voice is being actively conducted. Most of the studies that classify the baby's condition identified the baby's condition with a clean baby voice without noise. However, the baby's voice data collected in the real environment is likely to contain noise inside. Therefore, it is necessary to process the noise in the voice data. This paper proposes MCycle GAN (Multiple Cycle Generative Adversarial Net), which is a cycle GAN-based deep learning model for noise processing. MCycle GAN is a model in which multiple cycles are arranged in the existing Cycle GAN for more precise noise processing. The discrimination performance of the discriminator is improved by learning the adversarial relationship between a large number of generators and a small number of discriminators, and the generator needs to generate more precise forged data to deceive the discriminator. As a result of the experiment, the MCycle GAN model takes more training time than Cycle, but it showed stronger discriminant discrimination performance and generator forged data generation performance. However, when there are too many cycles, a small performance improvement can be seen compared to the increased learning time.

Keyword

표/그림 (14)

그림 (그림 2) MCycle GAN (4 Cycle) 구조 (Figure 2) The structure of MCycle GAN (4 cycle)
그림 (그림 1) Cycle 구조 (Figure 1) The structure of cycle
표 (표 1) 성능 평가를 위한 학습 모델 설정 (Table 1) Learning model Setup for performance evaluation
그림 (그림 1) Cycle 구조 (Figure 1) The structure of cycle
그림 (그림 2) MCycle GAN (4 Cycle) 구조 (Figure 2) The structure of MCycle GAN (4 cycle)
표 (표 1) 성능 평가를 위한 학습 모델 설정 (Table 1) Learning model Setup for performance evaluation
그림 (그림 3) Cycle GAN 학습 로그 (Figure 3) Training log of Cycle GAN
표 (표 2) 각 학습 모델의 성능 지표 (Table 2) Performance metrics for each model
그림 (그림 3) Cycle GAN 학습 로그 (Figure 3) Training log of Cycle GAN
표 (표 2) 각 학습 모델의 성능 지표 (Table 2) Performance metrics for each model
그림 (그림 5) MCycle GAN (4 Cycle) 학습 로그 (Figure 5) The Training log of MCycle GAN (4 Cycle)
그림 (그림 4) MCycle GAN (2 Cycle) 학습 로그 (Figure 4) The Training log of MCycle GAN (2 Cycle)
그림 (그림 5) MCycle GAN (4 Cycle) 학습 로그 (Figure 5) The Training log of MCycle GAN (4 Cycle)
그림 (그림 4) MCycle GAN (2 Cycle) 학습 로그 (Figure 4) The Training log of MCycle GAN (2 Cycle)

AI 본문요약
AI-Helper

문제 정의

본 논문은 잡음이 포함된 음성에서 아기 음성 추출을 위해 MCycle GAN(Multiple-Cycle Generative adversarial network, 다중 사이클 적대생성망)을 이용한 방법을 제안한다. MCycle GAN은 기존 Cycle GAN에서 Cycle을 다중으로 두어 학습하여, 판별자의 판별 능력을 강화하기 위해 고안했다.
본 논문은 잡음이 포함된 아기 음성 데이터에서 아기 음성만 추출하기 위한 인공지능 모델 MCycle GAN을 설계한다. 음성을 이미지처럼 사용하기 위해 스펙트로그램화하고, 이미지 학습을 진행한다.

제안 방법

본 논문은 잡음이 포함된 음성에서 아기 음성 추출을 위해 MCycle GAN(Multiple-Cycle Generative adversarial network, 다중 사이클 적대생성망)을 이용한 방법을 제안한다. MCycle GAN은 기존 Cycle GAN에서 Cycle을 다중으로 두어 학습하여, 판별자의 판별 능력을 강화하기 위해 고안했다. 학습 이후, Cycle 중 가장 성능이 좋은 Cycle의 생성자를 최종적으로 잡음 처리 모델로 사용한다.
학습 이후, Cycle 중 가장 성능이 좋은 Cycle의 생성자를 최종적으로 잡음 처리 모델로 사용한다. MCycle GAN 구조의 학습 모델을 구현 및 학습 진행하고, 학습이 완료된 MCycle GAN에서 잡음이 포함된 아이 음성(Noisy Baby Sound)을 입력받아, 가짜 깨끗한 아이 음성(Fake Clean Baby Sound)을 생성할 생성자(Generator)를 추출한다. 이때, 추출하는 생성자는 다수의 Cycle 중 가장 성능이 우수한 생성자이다.
MCycle GAN은 기존 Cycle GAN의 Cycle을 다중으로 두어 학습하여 판별자의 판별 능력을 강화하고, 학습시킨 Cycle 중 가장 성능이 좋은 Cycle에서 생성자를 최종적으로 선택하고자 설계하였다. 잡음이 포함된 아기 음성의 잡음 제거 시, 원본 아기 음성의 형태를 유지하기 위해서 원본 데이터의 형태를 보존하는 특성을 가진 Cycle GAN 기반 모델을 사용한다.
음성을 이미지처럼 사용하기 위해 스펙트로그램화하고, 이미지 학습을 진행한다. MCycle GAN 모델의 구조를 설계하고 구현하여, 여러 Cycle을 동시에 학습한다. 최종적으로 Cycle의 잡음이 포함된 아이 음성를 입력받아, 가짜 깨끗한 아이 음성을 생성하는 생성자 중 가장 성능이 좋은 생성자를 추출한다.
실험은 각각 Cycle을 1, 2, 4개 가진 모델의 성능을 평가하여 비교하는 방식으로 구성하였다. 성능 평가 지표로 원본 데이터와의 오차와 학습에 걸리는 시간, 그리고 판별자의 성능을 파악하기 위한 판별자의 손실 함수를 사용하였다.

대상 데이터

트레이닝 데이터 세트 : domain Y data는 1,500개의 잡음이 포함된 아이 음성 스펙트로그램, domain X data는 450개의 깨끗한 아이 음성 스펙트로그램을 사용한다.
테스트 데이터 세트 : domain Y data는 500개의 잡음이 포함된 아이 음성 스펙트로그램, domain X data는 50개의 깨끗한 아이 음성 스펙트로그램을 사용한다.
MCycle GAN은 훈련 시, 훈련 데이터 세트인 domain Y data, domain X data에서 각각 400개씩 무작위 추출하였다.
사용한 언어는 python3이고, 주요 라이브러리로 tensorflow 2.9.2, keras를 사용하였다. colab pro plus(OS : Linux-5.

이론/모형

MCycle GAN은 기존 Cycle GAN의 Cycle을 다중으로 두어 학습하여 판별자의 판별 능력을 강화하고, 학습시킨 Cycle 중 가장 성능이 좋은 Cycle에서 생성자를 최종적으로 선택하고자 설계하였다. 잡음이 포함된 아기 음성의 잡음 제거 시, 원본 아기 음성의 형태를 유지하기 위해서 원본 데이터의 형태를 보존하는 특성을 가진 Cycle GAN 기반 모델을 사용한다. 또한, Cycle GAN은 도메인끼리 서로 짝이 없는 훈련 데이터도 사용할 수 있기 때문에, 깨끗한 아이 음성, 잡음이 포함된 아이 음성 두 도메인의 데이터를 각각 수집하여 인공지능 학습을 진행할 수 있다.

성능/효과

실험 결과, 잡음 처리 시, MCycle GAN이 GAN, Cycle GAN에 비해 낮은 오차와 낮은 판별자의 손실 함수를 가진다. 이는 MCycle GAN이 원본 아기 소리의 형태를 유지하며 잡음을 처리하며, 잡음 처리한 데이터가 판별자를 잘 속일 수 있을 만큼, 깨끗한 아이 음성 도메인의 데이터와 크게 다르지 않음을 의미한다.

후속연구

MCycle GAN은 다른 훈련 데이터 세트로 학습을 진행함으로써, 아기 음성뿐만 아니라, 다른 특수한 타깃 음성에 대한 잡음 처리 또한 수행할 수 있다. 추후, MCycle GAN에서 다수의 Cycle들이 병렬적으로 분산 처리하여 학습 시간을 단축하는 연구가 필요하다. 또한, MCycle GAN 모델을 더 많은 epoch와 데이터를 사용하여 학습할 계획이다.
추후, MCycle GAN에서 다수의 Cycle들이 병렬적으로 분산 처리하여 학습 시간을 단축하는 연구가 필요하다. 또한, MCycle GAN 모델을 더 많은 epoch와 데이터를 사용하여 학습할 계획이다.

참고문헌 (19)

Park, Ji Hong, "A Noise Filtering Scheme with？Machine Learning for Audio Content Recognition", 2019.02. https://repository.hanyang.ac.kr/handle/20.500.11754/100017
Hoo-Young Lee, "A Study on a Non-Voice Section？Detection Model among Speech Signals using CNN？Algorithm", Journal of Convergence for information？Technology vol. 11. pp. 33-39, 2021. https://doi.org/10.22156/CS4SMB.2021.11.06.033

원문보기 상세보기
Kyung-Hyun Lim, "GAN with Dual Discriminator for？Non-stationary Noise Cancellation", 2019. http://sclab.yonsei.ac.kr/publications/Papers/KC/2019_KSC_KHL.pdf
Wonsup Shin, Jin-Young Kim, Sung-Bae Cho,？"GAN-based noise elimination model for high-quality？speech database", 한국소프트웨어종합학술대회 논문집, pp. 557-559, 2019. http://sclab.yonsei.ac.kr/publications/Papers/KC/2019_KSC_WSS.pdf
Santiago Pascual, Joan Serra, Antonio Bonafonte,？"Time-domain Speech Enhancement Using Generative？Adversarial Networks", Volume 114, Pages 10-21,？2019. https://doi.org/10.1016/j.specom.2019.09.001

상세보기
Zhu, Jun-Yan, et al. "Unpaired image-to-image？translation using cycle-consistent adversarial networks."？Proceedings of the IEEE international conference on？computer vision. 2017. https://doi.org/10.48550/arXiv.1703.10593
Lai, Wen-Hsing, Wang, Siou-Lin, Xu, Zhi-Yao.？"CycleGAN-Based Singing/Humming to Instrument？Conversion Technique," Electronics, 11(11), 1724, 2022. https://doi.org/10.3390/electronics11111724

상세보기
H. Kwon, M. Kim, J. Baek and K. Chung, "Voice？Frequency Synthesis using VAW-GAN based？Amplitude Scaling for Emotion Transformation," KSII？Transactions on Internet and Information Systems, vol. 16, no. 2, pp. 713-725, 2022. https://doi.org/10.3837/tiis.2022.02.018

원문보기 상세보기
McFee, Brian, Colin Raffel, Dawen Liang, Daniel PW？Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto.？"librosa: Audio and music signal analysis in python."？In Proceedings of the 14th python in science？conference, pp. 18-25. 2015. https://www.researchgate.net/publication/328777063_librosa_Audio_and_Music_Signal_Analysis_in_Python
Mao, Xudong, et al. "Least squares generative？adversarial networks." Proceedings of the IEEE？international conference on computer vision. 2017.？https://doi.org/10.48550/arXiv.1611.04076
He, Kaiming, et al. "Deep residual learning for image？recognition." Proceedings of the IEEE conference on？computer vision and pattern recognition. 2016.？https://doi.org/10.48550/arXiv.1512.03385
J. Zhu, L. Sun, Y. Wang, S. Subramani, D. Peng and？S. C. Nicolas, "A ResNet based multiscale feature？extraction for classifying multi-variate medical time？series," KSII Transactions on Internet and Information？Systems, vol. 16, no. 5, pp. 1431-1445, 2022. https://doi.org/10.3837/tiis.2022.05.002

원문보기 상세보기
Radford, Alec, Luke Metz, and Soumith Chintala.？"Unsupervised representation learning with deep？convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434. 2015. https://doi.org/10.48550/arXiv.1511.06434
Yamamoto S, Yoshitomi Y, Tabuse M, Kushida K,？Asada T. Recognition of a Baby's Emotional Cry？towards Robotics Baby Caregiver. International Journal？of Advanced Robotic Systems. 2013. https://doi.org/10.5772/55406

상세보기
Il-Kyu Hwang, Ho-Bum Song. "AI-based Infant State？Recognition Using Crying Sound". The Journal of？Korean Institute of Information Technology, 17(7), 13-21. 2019. http://dx.doi.org/10.14801/jkiit.2019.17.7.13

상세보기
Cohen, Rami, et al. "Baby Cry Detection: Deep？Learning and Classical Approaches." PsyArXiv, 17？Dec. 2019. https://doi.org/10.1007/978-3-030-31764-5
donateacry-corpus, Clean Baby Data Set,？https://github.com/gveres/donateacry-corpus
ESC-50: Dataset for Environmental Sound？Classification, Noisy Sound Data Set, https://github.com/karolpiczak/ESC-50
https://github.com/Sato-Kunihiko/audio-SNR

이 논문을 인용한 문헌

저자의 다른 논문 :

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

[국내논문] MCycleGAN: 잡음이 포함된 음성에서 아이 음성 추출을 위한 CycleGAN 기반의 딥러닝 모형
MCycleGAN: CycleGAN-based deep learning model for child speech extraction from noisy speech 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

Keyword

표/그림 (14)

표/그림 (14)

AI 본문요약
AI-Helper

문제 정의

제안 방법

대상 데이터

이론/모형

성능/효과

후속연구

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

활용도 분석정보

활용도 Top5 논문

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

[국내논문] MCycleGAN: 잡음이 포함된 음성에서 아이 음성 추출을 위한 CycleGAN 기반의 딥러닝 모형 MCycleGAN: CycleGAN-based deep learning model for child speech extraction from noisy speech 원문보기

초록 AI-Helper

Abstract ▼ AI-Helper

Keyword

표/그림 (14) 모든 표/그림 보기

표/그림 (14) 슬라이드로 보기

AI 본문요약 엑셀 다운로드 AI-Helper

문제 정의

제안 방법

대상 데이터

이론/모형

성능/효과

후속연구

참고문헌 (19)

이 논문을 인용한 문헌

저자의 다른 논문 :

정이나 (17)

활용도 분석정보

활용도 Top5 논문 더보기

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

[국내논문] MCycleGAN: 잡음이 포함된 음성에서 아이 음성 추출을 위한 CycleGAN 기반의 딥러닝 모형
MCycleGAN: CycleGAN-based deep learning model for child speech extraction from noisy speech 원문보기

초록
AI-Helper

표/그림 (14)

표/그림 (14)

AI 본문요약
AI-Helper

활용도 Top5 논문