[논문]얼굴 특징점을 활용한 영상 편집점 탐지

나요셉; 김진호; 박종혁

doi:10.13088/jiis.2023.29.4.015

얼굴 특징점을 활용한 영상 편집점 탐지
Detection of video editing points using facial keypoints 원문보기

지능정보연구 = Journal of intelligence and information systems, v.29 no.4, 2023년, pp.15 - 30

나요셉 (국민대학교 경영대학 AI빅데이터융합경영학과) , 김진호 (국민대학교 경영대학 AI빅데이터융합경영학과) , 박종혁 (국민대학교 경영대학 AI빅데이터융합경영학과)

초록
AI-Helper

최근 미디어 분야에도 인공지능(AI)을 적용한 다양한 서비스가 등장하고 있는 추세이다. 하지만 편집점을 찾아 영상을 이어 붙이는 영상 편집은, 대부분 수동적 방식으로 진행되어 시간과 인적 자원의 소요가 많이 발생하고 있다. 이에 본 연구에서는 Video Swin Transformer를 활용하여, 발화 여부에 따른 영상의 편집점을 탐지할 수 있는 방법론을 제안한다. 이를 위해, 제안 구조는 먼저 Face Alignment를 통해 얼굴 특징점을 검출한다. 이와 같은 과정을 통해 입력 영상 데이터로부터 발화 여부에 따른 얼굴의 시 공간적인 변화를 모델에 반영한다. 그리고, 본 연구에서 제안하는 Video Swin Transformer 기반 모델을 통해 영상 속 사람의 행동을 분류한다. 구체적으로 비디오 데이터로부터 Video Swin Transformer를 통해 생성되는 Feature Map과 Face Alignment를 통해 검출된 얼굴 특징점을 합친 후 Convolution을 거쳐 발화 여부를 탐지하게 된다. 실험 결과, 본 논문에서 제안한 얼굴 특징점을 활용한 영상 편집점 탐지 모델을 사용했을 경우 분류 성능을 89.17% 기록하여, 얼굴 특징점을 사용하지 않았을 때의 성능 87.46% 대비 성능을 향상시키는 것을 확인할 수 있었다.

Abstract ▼ AI-Helper

Recently, various services using artificial intelligence(AI) are emerging in the media field as well However, most of the video editing, which involves finding an editing point and attaching the video, is carried out in a passive manner, requiring a lot of time and human resources. Therefore, this study proposes a methodology that can detect the edit points of video according to whether person in video are spoken by using Video Swin Transformer. First, facial keypoints are detected through face alignment. To this end, the proposed structure first detects facial keypoints through face alignment. Through this process, the temporal and spatial changes of the face are reflected from the input video data. And, through the Video Swin Transformer-based model proposed in this study, the behavior of the person in the video is classified. Specifically, after combining the feature map generated through Video Swin Transformer from video data and the facial keypoints detected through Face Alignment, utterance is classified through convolution layers. In conclusion, the performance of the image editing point detection model using facial keypoints proposed in this paper improved from 87.46% to 89.17% compared to the model without facial keypoints.

주제어

참고문헌 (29)

Adrian, B., and Georgios, T., "How far are we？from solving the 2D & 3D Face Alignment？problem? (and a dataset of 230,000 3D facial？landmarks)", Proceedings of the 2017 IEEE？International Conference on Computer Vision？(ICCV), (2017), 1021-1030
Baxter, J., "A model of inductive bias learning."？Journal of artificial intelligence research 12,？Vol. 12, (2000), 149-198
Bengio, Y., Rejean D., Pascal V. and Christian J.,？"A neural probabilistic language model", The？Journal of Machine Learning Research, Vol3？(2003), 1137-1155
Beniaguev D., Youtube Faces With Facial Keypoints,？Kaggle, 2020. Available at https://www.kaggle.com/datasets/selfishgene/youtube-faces-with-facial-keypoints
Dapogny, A., K. Bailly, and M. Cord, "Decafa:？Deep convolutional cascade for face alignment？in the wild", Proceedings of the IEEE/CVF？International Conference on Computer Vision？(ICCV), (2019), 6893-6901
Dosovitskiy A., Lucas B., Alexander K., Dirk W.,？Xiaohua Z., Thomas U., Mostafa D., Matthias？M., Georg H., Sylvain G., Jakob U., and？Neil H., "AN IMAGE IS WORTH 16X16？WORDS: TRANSFORMERS FOR IMAGE？RECOGNITION AT SCALE", The International？Conference on Learning Representations (ICLR),？(2021)
He, K., Xiangyu Z., Shaoqing R. and Jian S., "Deep？Residual Learning for Image Recognition",？Proceedings of the IEEE Conference on Computer？Vision and Pattern Recognition (CVPR), (2016),？770-778
Hong, S., "AI-based automatic editing technology？trends" The Korean Institute of Broadcast and？Media Engineers, Vol.26, No.1(2021), 76-96
Hong, S., Y., Chung. and J.-H., Lee., "Semi-supervised？learning for sentiment analysis in mass social？media", Journal of Korean Institute of Intelligent？Systems, Vol. 24, No. 5, (2014), 482-488
Im, C-W. and D. H. Kwon, "A Study on the？Understanding of AI from the Perspective of？Users and Image Effects & Video Editing？Programs based on It", Cartoon & Animation？Studies, Vol., No60, (2020), 263-308
Kim, D.-H., "Similar Contents Recommendation？Model Based On Contents Meta Data Using？Language Model," Journal of Intelligence and？Information Systems, Vol. 29, No. 1(2023),？27-40
Kim, E., Qinglong, L., Pilsik, C. and J., Kim, "A？Study on the Media Recommendation System？with Time Period Considering the Consumer？Contextual Information Using Public Data",？Journal of Korean Institute of Intelligent？Systems, Vol 28, No.4, (2022), 95-117
Kim, H.-S., "A Study on Artificial intelligence？editing Highlight image recognition", Yonsei？University Graduate School of Communication,？2021. Available at https://dcollection.yonsei.ac.kr/public_resource/pdf/000000530830_20231016180537.pdf
Kim, J., H., Y. J. Shin, and H. C. Ahn, "Fake？News Detection on YouTube Using Related？Video Information," Journal of Intelligence？and Information Systems, Vol. 29, No. 3(2023),？19-36.
Kim, Y.-W., D. Y. Kim, and H. H. Seo, and？Young-Min Kim, "Content-based Korean journal？recommendation system using Sentence BERT,"？Journal of Intelligence and Information？Systems, Vol. 29, No. 3(2023), 37-55.
Lim, H., H., Moon, G., Park and Y., Lim, "Automatic？Video Editing Application based on Climax？Pattern Classified by Genre", JBE Vol. 25,？No. 6, (2020), 861-869
Lee, S., and J., Kim, "The Influence of Digital？Content Reflected in Social Media" The Journal？of Society for e-Business Studies Vol.23, No.4,？November (2018),127-136
Liu, Z., and Ning. J., Yue, C., Yixuan, W., Zheng, Z.,？Stephen, L., Han, H., "Video Swin Transformer"？Proceedings of the IEEE/CVF Conference on？Computer Vision and Pattern Recognition？(CVPR), 2022, 3202-3211
Liu, Z., Yutong L., Yue C., Han H., Yixuan W.,？Zheng Z., Stephen L. and Baining G., "Swin？Transformer: Hierarchical Vision Transformer？Using Shifted Windows", Proceedings of the？IEEE/CVF International Conference on Computer？Vision (ICCV), (2021), 10012-10022
Loshchilov, I. and Frank H., "Decoupled Weight？Decay Regularization" In International Conference？on Learning Representations (ICLR), (2017)
Newell, A., K., Yang, and J., Deng, "Stacked Hourglass？Networks for Human Pose Estimation", European？Conference on Computer Vision(ECCV), Vol.？9912, (2016), 483-499
Park, S.-W. and B., Wang, "Web-based Text-To-Sign？Language Translating System", Journal of Korean？Institute of Intelligent Systems, (2014), 265-269
Park, Y.-S., and H.-S. Kim, "Character Recognition？and Search for Media Editing", JBE, Vol.27,？No.4(2022), 519-526
Shaw P., Jakob U., and Ashish V., "Self-attention with？relative position representations", In Proceedings？of the 2018 Conference of the North American？Chapter of the Association for Computational？Linguistics: Human Language Technologies,？Vol 2, (2018), 464-468
Song, H., "Lip-Reading dataset", AIHub, 2021.？Available at https://aihub.or.kr/aihubdata/data/view.do?currMenu115&topMenu100&aihubDataSerealm&dataSetSn538
Tsung-Yi L., Piotr D., Ross G., Kaiming H., Bharath？H. and Serge B., "Feature Pyramid Networks？for Object Detection", Proceedings of the IEEE？Conference on Computer Vision and Pattern？Recognition (CVPR), (2017), 2117-2125
Vaswani, A., Noam S., Niki P., Jakob U., Lion J., Aidan？N. G., Lukasz K., and Illia P., "Attention is？all you need", Part of Advances in Neural？Information Processing Systems 30 (NIPS),？(2017), 5998-6008
Yang H. and K., Kim., "Real-time Lip Reading Interface？System Based on Deep Learning Model Using？Images", The Korean Intellectual Property Office,？2021. Available at https://patentimages.storage.googleapis.com/e3/66/d8/939e2a939bedab/KR20210054961A.pdf
Yim, J., J., Joo. and G, Lee, "Smart Phone Picture？Recognition Algorithm Using Electronic Maps？of Architecture Configuration", Journal of Society？for e-Business Studies, Vol 17, No. 3, (2012),？1-14

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

얼굴 특징점을 활용한 영상 편집점 탐지
Detection of video editing points using facial keypoints 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (29)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

얼굴 특징점을 활용한 영상 편집점 탐지 Detection of video editing points using facial keypoints 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (29)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

얼굴 특징점을 활용한 영상 편집점 탐지
Detection of video editing points using facial keypoints 원문보기

초록
AI-Helper