[논문]베이지안 분류를 이용한 립 리딩 시스템

김성우; 차경애; 박세현

doi:10.9723/jksiis.2020.25.4.009

초록
AI-Helper

음성 정보를 배제하고 영상 정보만을 이용한 발음 인식 시스템은 다양한 맞춤형 서비스에 적용될 수 있다. 본 논문에서는 베이지안 분류기를 기반으로 입술 모양을 인식하여 한글 모음을 구분하는 시스템을 개발한다. 얼굴 이미지의 입술 모양에서 특징 벡터를 추출하고 설계된 기계 학습모델을 적용하여 실험한 결과 'ㅏ' 발음의 경우 94%의 인식률을 보였으며, 평균 인식률은 약 84%를 나타내었다. 또한 비교군으로 실험한 CNN 환경에서의 인식률보다 높은 결과를 보였다. 이를 통해서 입술 영역의 랜드 마크로 설계된 특징 값을 사용하는 베이지안 분류 기법이 적은 수의 훈련 데이터에서 보다 효율적일 수 있음을 알 수 있다. 따라서 모바일 디바이스와 같은 제한적 하드웨어에서 응용 가능한 어플리케이션 개발에 활용할 수 있다.

Abstract ▼ AI-Helper

Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from t...

Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

주제어

표/그림 (13)

그림 Fig. 2 Lip Shape Detection
그림 Fig. 3 Feature Attribute Definition from Face Object Landmark
그림 Fig. 1 System Structure Diagram
그림 Fig. 4 Sample Input Images
표 Table 1 Measurement Values of Vowels' Pronunciation (unit : px)
표 Table 2 Statistical Values of x1 Attribute for Each Vowels' pronunciation (unit : px)
그림 Fig. 5 Recognition Results from a Video
표 Table 3 Recognition Rate according to Number of Training Images
표 Table 4 Confusion Matrix of Recognition Results
그림 Fig. 6 Lip Shape Image Generation Processes
그림 Fig. 7 CNN Layer
표 Table 5 Applied Functions to CNN
표 Table 6 Comparison of CNN and Proposed Bayesian Classifier

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

1 shows the processes of our system. Based on a designed ML (Machine Learning) model (Kim et al., 2019), this paper presents an implementation technique of the proposed system along with an experimental analysis to present and verify the extracted data.
In this paper, a system was implemented to detect a human face in real-time images and distinguish five Korean vowel pronunciations according to lip shape.
In this paper, we design an effective feature vector for recognizing Korean vowels and apply it to a Bayesian learning model (Choi et al., 2001; Oh, 2008). The experimental results show the advantage of our proposed method.

대상 데이터

The input image set is a collection of 120 images for each of the 5 pronunciations. A total of 600 data images of men and women between the ages of 20 and 50 were used for the input data. Fig.
Since the input image is a gray level image, only one channel is implemented. For a set of 600 images, 500 images were divided into training data, and the remaining 100 images were used as test data.
We can also examine whether the recognition rate increases with the number of training images. The training data are composed of 10, 20, 40, and 60 images per pronunciation, so there are 50, 100, 200, and 300 images for each vowel, respectively. Moreover, The test set consists of 50 images for each pronunciation.
To find out the ratio of incorrectly recognized pronunciations, 123 images were randomly selected from the 600 images. The recognition results are represented in Table 4.

참고문헌 (15)

Choi, J. H., Kim, J. B., Kim, D. G., and Rim, K. W. (2001). Bayesian Model for Probabilistic Unsupervised Learning,
Cetingul, H. E., Erzin, E., Yemez, Y., and Tekalp, A. M. (2006). Multimodal Speaker/Speech Recognition using Lip Motion, Lip Texture and Audio, Signal Processing, 86(12), 3549-3558.

상세보기
Chung, J. S., and Zisserman, A. (2016). Lip Reading in the Wild, Asian Conference on Computer Vision, Springer, Cham.
Dlib C++ Library (2002). General Purpose Cross-platform Software Library, http://dlib.net/ (Accessed on Aug. 10th, 2020).
Gyu, S. M., Pham, T. T., Kim, J. Y., and Taek, H. S. (2009). A Study on Lip Detection based on Eye Localization for Visual Speech Recognition in Mobile Environment, International Journal of Fuzzy Logic and Intelligent Systems, 19(4), 478-484.
Hwang, W. (2017). Research Trends in Deep Learning Based Face Detection, Landmark Detection and Face Recognition, Broadcasting and Media Magazine, 22(4), 41-49.
Kim, Y. K., Lee, H. S., and Kim, M. H. (2014). Lip Reading Method using Bool Matrix and SVM, Proceedings of 2014 Conference on Korea, HCI, pp. 179-182.
Kim, Y. K., Lim, J. G., and Kim, M. H. (2016). Lip Reading Method using CNN for Utterance Period Detection, Journal of Digital Convergence, 14(8), 233-243.
Kim, D., Choi, S., and Kwak, S. (2018), Deep Learning Based Fake Face Detection, Journal of the Korea Industrial Information Systems Research, 23(5), 9-17.
Kim, S., Cha, K., and Park, S. (2019). Recognition of Korean Vowels using Bayesian Classification with Mouth Shape, Journal of Korea Multimedia Society, 22(8), 852-859.
Lee, S., Lee, Y., Hong, H., Yun, B., and Han, M. (2002), Audio-visual Integration based Multi-modal Speech Recognition System, Proceedings of KIPS Fall Conference, 707-710.
Lim, D. Y., Kim, S. G., and Chong, K. T. (2018). Development of a Real-time Lip Recognition for Improving English Pronunciation using Deep Learning, Journal of Institute of Control, Robotics and Systems, 24(4), 327-333.

상세보기
Oh, I. S. (2008). Pattern Recognition, Kyobobook.
Viola, P., and Jones, M. (2001). Rapid Object Detection using a Boosted Cascade of Simple Features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001(1), 511-518.
Xianoyi, Y. (2017). Lipreading Recognition of English Vowels using Convolutional Neural Network and Recurrent Neural Network, Master's Thesis, Chonbuk National University, Korea.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

베이지안 분류를 이용한 립 리딩 시스템
Lip-reading System based on Bayesian Classifier 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (13)

표/그림 (13)

AI 본문요약
AI-Helper

제안 방법

대상 데이터

성능/효과

참고문헌 (15)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

베이지안 분류를 이용한 립 리딩 시스템 Lip-reading System based on Bayesian Classifier 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (13) 모든 표/그림 보기

표/그림 (13) 슬라이드로 보기

AI 본문요약 엑셀 다운로드 AI-Helper

제안 방법

대상 데이터

성능/효과

참고문헌 (15)

이 논문을 인용한 문헌

저자의 다른 논문 :

차경애 (24) 박세현 (20)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

베이지안 분류를 이용한 립 리딩 시스템
Lip-reading System based on Bayesian Classifier 원문보기

초록
AI-Helper

표/그림 (13)

표/그림 (13)

AI 본문요약
AI-Helper