[논문]단일 영상 기반 3차원 복원을 위한 약교사 인공지능 기술 동향

김승룡

doi:10.5909/jbe.2021.26.1.70

단일 영상 기반 3차원 복원을 위한 약교사 인공지능 기술 동향
Recent Trends of Weakly-supervised Deep Learning for Monocular 3D Reconstruction 원문보기

방송공학회논문지 = Journal of broadcast engineering, v.26 no.1, 2021년, pp.70 - 78

김승룡 (고려대학교 컴퓨터학과)

초록
AI-Helper

2차원 단일 영상에서 3차원 깊이 정보를 복원하는 기술은 다양한 한계 및 산업계에서 활용도가 매우 높은 기술임이 분명하다. 하지만 2차원 영상은 임의의 3차원 정보의 투사의 결과라는 점에서 내재적 깊이 모호성(Depth ambiguity)을 가지고 있고 이를 해결하는 문제는 매우 도전적이다. 이러한 한계점은 최근 인공지능 기술의 발달에 힘입어 2차원 영상과 3차원 깊이 정보간의 대응 관계를 학습하는 알고리즘의 발달로 극복되어 지고 있다. 이러한 3차원 깊이 정보 획득을 위한 인공지능 기술을 학습하기 위해서는 대응 관계를 나타내는 대규모의 학습데이터의 필요성이 절대적인데, 이러한 데이터는 취득 및 가공 과정에서 상당한 노동력을 필요로 하기에 제한적으로 구축이 가능하다. 따라서 최근의 기술 발전 동향은 대규모의 2차원 영상과 메타 데이터를 활용하여 3차원 깊이 정보를 예측하려는 약교사(Weakly-supervised) 인공지능 기술의 발전이 주를 이루고 있다. 본 고에서는 이러한 기술 발전 동향을 장면(Scene) 3차원 복원 기술과 객체(Object) 3차원 복원 기술로 나누어 요약하고 현재의 기술들의 한계점과 향후 나아갈 방향에 대해서 토의한다.

Abstract ▼ AI-Helper

Estimating 3D information from a single image is one of the essential problems in numerous applications. Since a 2D image inherently might originate from an infinite number of different 3D scenes, thus 3D reconstruction from a single image is notoriously challenging. This challenge has been overcame by the advent of recent deep convolutional neural networks (CNNs), by modeling the mapping function between 2D image and 3D information. However, to train such deep CNNs, a massive training data is demanded, but such data is difficult to achieve or even impossible to build. Recent trends thus aim to present deep learning techniques that can be trained in a weakly-supervised manner, with a meta-data without relying on the ground-truth depth data. In this article, we introduce recent developments of weakly-supervised deep learning technique, especially categorized as scene 3D reconstruction and object 3D reconstruction, and discuss limitations and further directions.

주제어

표/그림 (6)

그림 그림 1. 단일 2차원 영상 기반 장면 3차원 깊이 정보 복원의 예시: 위- 단일 칼라 영상, 중간- 예측된 깊이 정보, 아래- 예측된 깊이 정보 기반 렌더링 결과 ([3] 논문에서 발췌) Fig. 1. Examples of scene-level monocular 3D reconstruction from a single image: top- single color image, middle- estimated depth image, bottomrendering with depth image (from [3])
그림 그림 2. 예측된 3차원 깊이 정보를 활용하여 합성된 좌안 영상과 실제 좌안 영상과의 차이를 활용한 약교사 학습 예 ([11] 논문에서 발췌) Fig. 2. Weakly-supervised learning with a similarity between an estimated right image and a real right image (from [11])
그림 그림 3. 스테레오 영상을 활용한 약교사 학습 기법 (Monodepth2-S) 과 동영상을 활용한 약교사 학습 기법 (Monodepth2-M), 그리고 이의 융합을 활용한 기법 (Monodepth 2-MS) 의 예시 ([5] 논문에서 발췌) Fig. 3. Examples of weakly-supervised learning with stereo (Monodepth2-S), monocular video (Monodepth2-M) and their fusion (Monodepth2-MS)(from [5])
그림 그림 4. 최신 단일 영상 기반 장면 3차원 복원 기술의 KITTI 데이터셋 에서의 성능 평가: S: 스테레오, V: 비디오, P: 프록시, A: 추가적인 정보, GT: 정답 3차원 정보, F: Frieburg, CS: Cityscapes, E2E: End-to-End ([2] 논문에서 발췌) Fig. 4. Quantitative evaluation on the KITTI dataset. † indicates feature extractors pre-trained on ImageNet. S: stereo, V: video, P: proxy, A: additional information, GT: groundtruth, F: Frieburg, CS: Cityscapes, E2E: End-to-End (from [2])
그림 그림 5. 단일 영상 기반 객체 3차원 복원 기술의 예시 ([16] 논문에서 발췌) Fig. 5. Examples of weakly-supervised object 3D reconstruction (from [16])
그림 그림 6. 단일 영상 기반 사람 객체 3차원 복원 기술의 예시 ([7] 논문에서 발췌) Fig. 6. Examples of weakly-supervised human 3D reconstruction (from [7])

참고문헌 (17)

D. Scharstein and R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," IJCV, Vol. 47, pp. 7-42, April 2002.
M. Poggi, F. Tosi, K. Batsos, P. Mordohai, and S. Mattoccia, "On the Synergies between Machine Learning and Stereo: a Survey," arXiv:2004.08566, 2020.
R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer," TPAMI, 2020.
C. Godard, O. M. Aodha, and G. J. Browstow, "Unsupervised Monocular Depth Estimation with Left-Right Consistency," CVPR, 2017.
C. Godard, O. M. Aodha, M. Firman, and G. J. Browstow, "Digging into Self-Supervised Monocular Depth Prediction," ICCV, 2019.
A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik, "Learning Category-Specific Mesh Reconstruction from Image Collections," ECCV, 2016.
A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, "End-to-end Recovery of Human Shape and Pose," CVPR, 2018.
A. Saxena, M, Sun, A. Y. Ng, "Make3D: Learning 3D Scene Structure from a Single Still Image," TPAMI, Vol. 31, No. 5, pp. 824-840, May 2009.

상세보기
D. Eigen, C. Puhrsch, and R. Fergus, "Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network," NeurIPS, 2014.
J. Xie, R. Girshick, and A. Farhadi, "Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks," ECCV, 2016.
R. Garg, V. Kumar, G. Carneiro, I. Reid, "Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue," ECCV, 2016.
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, "Unsupervised Learning of Depth and Ego-Motion frrom Video," CVPR, 2017.
A. Ranjan, V. Jampani, L. Balles, K. Kim, D. Sun, J. Wulff, M. J. Black, "Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera, Motion, Optical Flow and Motion Segmentation," CVPR, 2019.
S. Zhu, G. Brazil, X. Liu, "The Edge of Depth: Explicit Constraints between Segmentation and Depth," CVPR, 2020.
N. Kulkarni, A. Gupta, S. Tulsiani, "Canonical Surface Mapping via Geometric Cycle Consistency," ICCV, 2019.
S. Goel, A. Kanazawa, and J. Malik, "Shape and Viewpoint without Keypoints," ECCV, 2020.
N. Kolotouros, G. Pavlakos, M. J. Black, K. Daniilidis, "Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop," ICCV, 2019.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증