[논문]인물 개체 분할을 위한 맥락-의존적 비디오 데이터 보강

전현진; 이종훈; 김인철

doi:10.3745/ktsde.2023.12.5.217

초록
AI-Helper

비디오 개체 분할은 비디오를 구성하는 영상 프레임 각각에 대해 관심 개체 분할을 수행해야 할 뿐만 아니라, 해당 비디오를 구성하는 프레임 시퀀스 전체에 걸쳐 개체들에 대한 정확한 트래킹을 요구하기 때문에 난이도가 높은 기술이다. 특히 드라마 비디오에서 인물 개체 분할은 다양한 장소와 시간대에서 상호 작용하는 복수의 주요 등장인물들에 대한 정확한 트래킹을 요구하는 특징을 가지고 있다. 또한, 드라마 비디오 인물 개체분할은 주연 인물들과 조연 혹은 보조 출연 인물들 간의 등장 빈도에 상당한 차이가 있어 일종의 클래스 불균형 문제도 있다. 본 논문에서는 미생 드라마 비디오들을 토대로 구축한 인물 개체 분할 데이터 집합인 MHIS를 소개하고, 등장인물 클래스 간의 심각한 데이터 불균형 문제를 효과적으로 해결하기 위한 새로운 비디오 데이터 보강 기법인 CDVA를 제안한다. 기존의 비디오 데이터 보강 기법들과는 달리, 새로운 CDVA 보강 기법은 비디오들의 시-공간적 맥락을 충분히 고려해서 목표 인물이 삽입되어야 할 배경 클립 내의 위치를 결정함으로써, 보다 더 현실적인 보강 비디오들을 생성한다. 따라서 본 논문에서 제안하는 새로운 비디오 데이터 보강 기법인 CDVA는 비디오 개체 분할을 위한 심층 신경망 모델의 성능을 효과적으로 향상시킬 수 있다. 본 논문에서는 MHIS 데이터 집합을 이용한 다양한 정량 및 정성 실험들을 통해, 제안 비디오 데이터 보강 기법의 유용성과 효과를 입증한다.

Abstract ▼ AI-Helper

Video instance segmentation is an intelligent visual task with high complexity because it not only requires object instance segmentation for each image frame constituting a video, but also requires accurate tracking of instances throughout the frame sequence of the video. In special, human instance ...

Video instance segmentation is an intelligent visual task with high complexity because it not only requires object instance segmentation for each image frame constituting a video, but also requires accurate tracking of instances throughout the frame sequence of the video. In special, human instance segmentation in drama videos has an unique characteristic that requires accurate tracking of several main characters interacting in various places and times. Also, it is also characterized by a kind of the class imbalance problem because there is a significant difference between the frequency of main characters and that of supporting or auxiliary characters in drama videos. In this paper, we introduce a new human instance datatset called MHIS, which is built upon drama videos, Miseang, and then propose a novel video data augmentation method, CDVA, in order to overcome the data imbalance problem between character classes. Different from the previous video data augmentation methods, the proposed CDVA generates more realistic augmented videos by deciding the optimal location within the background clip for a target human instance to be inserted with taking rich spatio-temporal context embedded in videos into account. Therefore, the proposed augmentation method, CDVA, can improve the performance of a deep neural network model for video instance segmentation. Conducting both quantitative and qualitative experiments using the MHIS dataset, we prove the usefulness and effectiveness of the proposed video data augmentation method.

주제어

표/그림 (12)

그림 Fig. 1. Previous Video Data Augmentation Methods: (a) VideoMix, (b) ObjectMix, (c) B-Aug
그림 Fig. 2. Context-Dependent Video Data Augmentation
그림 Fig. 3. Finding the Maximal Free Region
그림 Fig. 4. Deciding the Target Instance Location
그림 Fig. 5. Mask-based Augmented Video Clip Generation
그림 Fig. 6. Miseang Human Instance Segmentation Dataset
그림 Fig. 7. The Overall Architecture of SeqFormer Model
표 Table 1. Comparing Benchmark Datasets for Video Instance Segmentation
표 Table 2. Experimental Results with Different Video Augmentation Methods
표 Table 3. Experimental Results with Different Video Instance Segmentation Models
그림 Fig. 8. Example Videos Generated by Different Augmentation Methods: (a) VideoMix, (b) ObjectMix, (c) B-Aug, (d) CDVA(Ours)
그림 Fig. 9. Human Instance Segmentation Results with Different Augmentation Methods: (a) W/O, (b) VideoMix, (c) ObjectMix, (d) B-Aug, (e) CDVA(Ours)

참고문헌 (14)

L. Yang, Y. Fan, and N. Xu, "Video instance segmentation,"？Proceedings of IEEE/CVF International Conference on？Computer Vision, 2019.
S. Yun, S. J. Oh, B. Heo, D. Han, and J. Kim, "VideoMix:？Rethinking data augmentation for video classification,"？arXiv preprint arXiv: 2012.03457, 2020.
J. Kimata, T. Nitta, and T. Tamaki, "ObjectMix: Data？augmentation by copy-pasting by copy-pasting objects in？videos for action recognition." arXiv preprint arXiv:？2204.00239, 2022.
H. Kim, D. Kim, J. Kim, and S. Im, "Data augmentation？scheme for semi-supervised video object segmentation,"？Journal of Broadcast Engineering, Vol.27, No.1, 2022.
H. J. Chun and I. Kim, "Human instance segmentation？using video data augmentation." Proceedings of the Annual？Conference of Korea Information Processing Society？Conference (KIPS) 2022, Vol.29, No.2, pp.532-534, 2022.
J. Wu, Y. Jiang, S. Bai, W. Zhang, and X. Bai, "SeqFormer:？Sequential transformer for video instance segmentation,"？European Conference on Computer Vision, Springer, Cham,？2022.
A. Athar, S. Mahadevan, A. Osep, L. Leal-Taixe, and B.？Leibe, "STEm-Seg: Spatio-temporal embeddings for instance segmentation in videos." European Conference on？Computer Vision, Springer, Cham, 2020.
Y. Wang, Z. Xu, X. Wang, C. Shen, B. Cheng, H. Shen,？and H. Xia, "End-to-end video instance segmentation with？transformers." Proceedings of IEEE/CVF Conference on？Computer Vision and Pattern Recognition, 2022.
S. Hwang, M. Heo, S. W. Oh, and S. J. Kim, "Video instance segmentation using inter-frame communication？transformers." Advances in Neural Information Processing？Systems, Vol.34, 2021.
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov,？and S. Zagoruyko, "End-to-end object detection with transformers," European Conference on Computer Vision,？Springerm Cham, 2020.
H. S. Fang, J. Sun, R. Wang, M. Gou, Y. L. Li, and C. Lu,？"InstaBoost: Boosting instance segmentation via probability map guided copy-pasting," Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.
N. Dvornik, J. Mairal, and C. Schmid, "Modeling visual？context is key to augmenting object detection datasets,"？Proceedings of the European Conference on Computer？Vision, 2018.
G. Ghiasi et al., "Simple copy-paste is a strong data？augmentation method for instance segmentation,"？Proceedings of IEEE/CVF Conference on Computer Vision？and Pattern Recognition, 2021.
J. Qi et al., "Occluded video instance segmentation." arXiv？preprint arXiv: 2102.01558, 2021.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

인물 개체 분할을 위한 맥락-의존적 비디오 데이터 보강
Context-Dependent Video Data Augmentation for Human Instance Segmentation 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (12)

표/그림 (12)

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

인물 개체 분할을 위한 맥락-의존적 비디오 데이터 보강 Context-Dependent Video Data Augmentation for Human Instance Segmentation 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (12) 모든 표/그림 보기

표/그림 (12) 슬라이드로 보기

참고문헌 (14)

이 논문을 인용한 문헌

저자의 다른 논문 :

김인철 (75)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

인물 개체 분할을 위한 맥락-의존적 비디오 데이터 보강
Context-Dependent Video Data Augmentation for Human Instance Segmentation 원문보기

초록
AI-Helper

표/그림 (12)

표/그림 (12)