[논문]문장 독립 화자 검증을 위한 그룹기반 화자 임베딩

정영문; 엄영식; 이영현; 김회린

doi:10.7776/ask.2021.40.5.496

문장 독립 화자 검증을 위한 그룹기반 화자 임베딩
Group-based speaker embeddings for text-independent speaker verification 원문보기

한국음향학회지= The journal of the acoustical society of Korea, v.40 no.5, 2021년, pp.496 - 502

정영문 (KAIST 전기및전자공학부) , 엄영식 (KAIST 전기및전자공학부) , 이영현 (KAIST 전기및전자공학부) , 김회린 (KAIST 전기및전자공학부)

초록
AI-Helper

딥러닝 기반의 심층 화자 임베딩 방식은 최근 문장 독립 화자 검증 연구에 널리 사용되고 있으며, 기존의 i-vector 방식에 비해 더 좋은 성능을 보이고 있다. 본 연구에서는 심층 화자 임베딩 방식을 발전시키기 위하여, 화자의 그룹 정보를 도입한 그룹기반 화자 임베딩을 제안한다. 훈련 데이터 내에 존재하는 전체 화자들을 정해진 개수의 그룹으로 비지도 클러스터링 하며, 고정된 길이의 그룹 임베딩 벡터가 각각의 그룹을 대표한다. 그룹 결정 네트워크가 각 그룹에 대응되는 그룹 가중치를 출력하며, 이를 이용한 그룹 임베딩 벡터들의 가중 합을 통해 집합 그룹 임베딩을 추출한다. 최종적으로 집합 그룹 임베딩을 심층 화자 임베딩에 더해주어 그룹기반 화자 임베딩을 생성한다. 이러한 방식을 통해 그룹 정보를 심층 화자 임베딩에 도입함으로써, 화자 임베딩이 나타낼 수 있는 전체 화자의 검색 공간을 줄일 수 있고, 이를 통해 화자 임베딩은 많은 수의 화자를 유연하게 표현할 수 있다. VoxCeleb1 데이터베이스를 이용하여 본 연구에서 제안하는 방식이 기존의 방식을 개선시킨다는 것을 확인하였다.

Abstract ▼ AI-Helper

Recently, deep speaker embedding approach has been widely used in text-independent speaker verification, which shows better performance than the traditional i-vector approach. In this work, to improve the deep speaker embedding approach, we propose a novel method called group-based speaker embedding which incorporates group information. We cluster all speakers of the training data into a predefined number of groups in an unsupervised manner, so that a fixed-length group embedding represents the corresponding group. A Group Decision Network (GDN) produces a group weight, and an aggregated group embedding is generated from the weighted sum of the group embeddings and the group weights. Finally, we generate a group-based embedding by adding the aggregated group embedding to the deep speaker embedding. In this way, a speaker embedding can reduce the search space of the speaker identity by incorporating group information, and thereby can flexibly represent a significant number of speakers. We conducted experiments using the VoxCeleb1 database to show that our proposed approach can improve the previous approaches.

주제어

표/그림 (3)

그림 Fig. 1. (Color available online) Overall structure of the proposed method. FC : fully-connected layer, GDN : group decision network.
표 Table 1. Ablation results of the proposed method (EER %). Grs : Groups, Naive : naive labeling, SDL : self-distributed labeling, Add. : addition, Concat. : concatenation.
표 Table 2. Performance comparison with state-ofthe- art systems in terms of EER (%). ASM : ASoftmax, SAP : self-attentive pooling, SPE : spatial pyramid encoding, SM : softmax, ASP : attentive statistics pooling.

참고문헌 (18)

J. H. L. Hansen and T. Hasan, "Speaker recognition by machines and humans: A tutorial review," IEEE Signal Processing Magazine, 32, 74-99 (2015).

상세보기
N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," IEEE Trans on. Audio, Speech, and Lang. Process. 19, 788-798 (2011).

상세보기
S. Ioffe, "Probabilistic linear discriminant analysis," Proc. ECCV. 531-542 (2006).
A. Kanagasundaram, R. Vogt, D. Dean, S. Sridharan, and M. Mason, "I-vector based speaker recognition on short utterances," Proc. Interspeech, 2341-2344 (2011).
A. Hajavi and A. Etemad, "A deep neural network for short-segment speaker recognition," Proc. Interspeech, 2878-2882 (2019).
Y. Jung, S. M. Kye, Y. Choi, M. Jung, and H. Kim, "Improving multi-scale aggregation using feature pyramid module for robust speaker verification of variable-duration utterances," Proc. Interspeech, 1501-1505 (2020).
Y. Jung, Y. Choi, H. Lim, and H. Kim, "A unified deep learning framework for short-duration speaker verification in adverse environments," IEEE Access, 8, 175448-175466 (2020).

상세보기
V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural network architecture for efficient modeling of long temporal contexts," Proc. Interspeech, 3214-3218 (2015.)
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," Proc. ICLR. 1-14 (2015).
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE CVPR. 770-778 (2016).
A. Nagrani, J. S. Chung, and A. Zisserman, "VoxCeleb: A largescale speaker identification dataset," Proc. Interspeech, 2616-2620 (2017).
W. Cai, J. Chen, and M. Li, "Exploring the encoding layer and loss function in end-to-end speaker and language recognition system," Proc. Odyssey, 74-81 (2018).
Y. Jung, Y. Kim, H. Lim, Y. Choi, and H. Kim, "Spatial pyramid encoding with convex length normalization for text-independent speaker verification," Proc. Interspeech, 4030-4034 (2019).
E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez-Dominguez, "Deep neural networks for small footprint text-dependent speaker verification," Proc. IEEE ICASSP. 4052-4056 (2014).
Z. Huang, S. Wang, and K. Yu, "Angular softmax for short-duration text-independent speaker verification," Proc. Interspeech, 3623-3627 (2018).
Y. Liu, L. He, and J. Liu, "Large margin softmax loss for speaker verification," Proc. Interspeech, 2873-2877 (2019).
Y. Kim, W. Park, M-C. Roh, and J. Shin, "Groupface: learning latent groups and constructing group-based representations for face recognition," Proc. IEEE CVPR. 5621-5630 (2020).
K. Okabe, T. Koshinaka, and K. Shinoda, "Attentive statistics pooling for deep speaker embedding," Proc. Interspeech, 2252-2256 (2018).

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

문장 독립 화자 검증을 위한 그룹기반 화자 임베딩
Group-based speaker embeddings for text-independent speaker verification 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (3)

표/그림 (3)

참고문헌 (18)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

문장 독립 화자 검증을 위한 그룹기반 화자 임베딩 Group-based speaker embeddings for text-independent speaker verification 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (3)

표/그림 (3)

참고문헌 (18)

이 논문을 인용한 문헌

저자의 다른 논문 :

김회린 (1)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

문장 독립 화자 검증을 위한 그룹기반 화자 임베딩
Group-based speaker embeddings for text-independent speaker verification 원문보기

초록
AI-Helper