[논문]메타버스 환경의 버추얼 싱어 제작을 위한 딥러닝 기반 노래 생성 및 음성 합성 파이프라인 설계 및 구현

전이슬

메타버스 환경의 버추얼 싱어 제작을 위한 딥러닝 기반 노래 생성 및 음성 합성 파이프라인 설계 및 구현
Design and Implementation of Deep Learning Based Song Generation and Voice Synthesis Pipeline for Virtual Singer Production in Metaverse Environment 원문보기

전이슬 (동서대학교 일반대학원 컴퓨터공학과 국내석사)

초록 ▼
AI-Helper

팬데믹은 비대면 의사소통 방식을 넘어 가상 세계인 메타버스 도입을 앞당겼다. 메타버스 환경에서 자신을 표현하기 위한 수단으로 ‘아바타’가 다시금 화제가 되고, 인간과 유사한 생활양식을 갖춘 버추얼 휴먼이 등장하여 현실과 가상의 경계에서 활동하고 있다. 그러나 버추얼 휴먼이 자연스럽게 춤을 추고 노래를 부르는 모습을 구현하기 위해서는 사람이 직접 춤을 추는 모습을 녹화하고, 노래 부르는 것을 녹음하여 합성 기술에 숙련된 전문가가 세밀하게 교정하거나 수정해야 하는 등 아직 인간의 손길이 많이 필요하다. 본 논문에서는 메타버스 뮤지컬의 등장인물인 버추얼 휴먼이 부를 주제곡을 인공지능으로 작사, 작곡하고, 가창 음성 합성 및 Audio2Face 기술을 활용하여 인간의 개입 없이 노래를 부르는 모습을 제작하는 파이프라인을 설계하고 구현한다. 극의 분위기에 어울리는 음원을 생성하기 위해 뮤지컬 대본을 분석하여 ‘시각적인 음향 연출 지도(Sound Map)’를 제작하고, 이를 바탕으로 음원 데이터 수집 및 Magenta 모델을 학습시킨다. 또한, 노래를 통해 캐릭터를 표현하고, 스토리를 전달하는 뮤지컬의 특성을 고려하여 등장인물의 MBTI 성향에 맞는 가사 데이터를 수집하여 GPT-2(Generative Pre-trained Transformer-2) 모델을 학습시켜 뮤지컬 주제곡을 생성한다. 이후 기존 SVS(Singing Voice Synthesis) 모델에 비해 합성 품질 및 속도에서 우수한 MLP Singer 모델을 활용하여 인공지능 작사, 작곡으로 생성한 뮤지컬 주제곡을 부르는 음성을 합성한다. 마지막으로 얼굴 감정 매개변수 조절과 전처리 과정을 거친 뒤, 합성된 가창 음성을 Audio2Face의 딥뉴럴 네트워크를 통해 버추얼 휴먼이 뮤지컬 주제곡의 분위기를 잘 표현하며 노래를 부르는 얼굴 애니메이션을 구현한다. 본 논문에서 제안한 파이프라인을 통해 인공지능을 활용한 음반 및 공연 콘텐츠 제작의 한계점을 극복하고, 미디어 지능화의 가능성을 확인할 수 있다.

Abstract ▼ AI-Helper

The pandemic has accelerated the adoption of the Metaverse beyond contactless communication. Avatars have resurfaced as a captivating subject for self-expression within the metaverse, reigniting discussions on virtual humans that have lifestyles similar to real. These digital beings operate on the border between reality and virtual. However, in order to achieve natural dancing and singing for a virtual human, a significant amount of human involvement is still required. This includes recording a person's own dance and song, and carefully calibrating or modifying them by an expert skilled in synthesis technology. In this paper, we design and implement a pipeline to compose artificially a theme song that will be sung by a virtual human character in a metaverse musical, and produce a singing performance without human intervention using singing voice synthesis and Audio2Face technology.
As a way to create a sound source suitable for the musical, firstly we analyzed the musical script and created a visual sound production map. This map was used to collect sound dataset and train the Magenta model. Secondly, considering the musical characteristics that characters express and tell stories through songs, we gathered lyrics data suitable for the MBTI(Myers-Briggs Type Indicator) tendencies and trained the GPT-2(Generative Pre-trained Transformer-2) model to create a musical theme song. The MLP Singer model, which excels in synthesis quality and speed compared to the existing SVS(Singing Voice Synthesis) model, was used to synthesize a voice that sings the musical theme song generated by AI writing and composition. Finally, the facial emotion parameters were adjusted and preprocessed, and the synthesized singing voice was applied to Audio2Face's deep neural network to create a facial animation of the virtual human singing the musical theme song in a way that matches the mood of the song.
Consequently, the pipeline proposed in this paper overcomes the limitations of creating and performing music using artificial intelligence, demonstrating the enormous potential of media intelligence.

주제어

학위논문 정보

저자	전이슬
학위수여기관	동서대학교 일반대학원
학위구분	국내석사
학과	컴퓨터공학과
지도교수	문미경
발행연도	2023
총페이지	66
키워드	인공지능 메타버스 버추얼 싱어 딥러닝 인공지능 작사 인공지능 작곡 음성합성 SVS
언어	kor
원문 URL	http://www.riss.kr/link?id=T16856392&outLink=K
정보원	한국교육학술정보원

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

초록 ▼
AI-Helper