[논문]자율주행을 위한 Self-Attention 기반 비지도 단안 카메라 영상 깊이 추정

황승준; 박성준; 백중환

doi:10.12673/jant.2023.27.2.182

자율주행을 위한 Self-Attention 기반 비지도 단안 카메라 영상 깊이 추정
Unsupervised Monocular Depth Estimation Using Self-Attention for Autonomous Driving 원문보기

한국항행학회논문지 = Journal of advanced navigation technology, v.27 no.2, 2023년, pp.182 - 189

황승준 (한국항공대학교 항공전자정보공학부) , 박성준 (한국항공대학교 항공전자정보공학부) , 백중환 (한국항공대학교 항공전자정보공학부)

초록
AI-Helper

깊이 추정은 차량, 로봇, 드론의 자율주행을 위한 3차원 지도 생성의 핵심 기술이다. 기존의 센서 기반 깊이 추정 방식은 정확도는 높지만 가격이 비싸고 해상도가 낮다. 반면 카메라 기반 깊이 추정 방식은 해상도가 높고 가격이 저렴하지만 정확도가 낮다. 본 연구에서는 무인항공기 카메라의 깊이 추정 성능 향상을 위해 Self-Attention 기반의 비지도 단안 카메라 영상 깊이 추정을 제안한다. 네트워크에 Self-Attention 연산을 적용하여 전역 특징 추출 성능을 향상시킨다. 또한 카메라 파라미터를 학습하는 네트워크를 추가하여 카메라 칼리브레이션이 안되어있는 이미지 데이터에서도 사용 가능하게 한다. 공간 데이터 생성을 위해 추정된 깊이와 카메라 포즈는 카메라 파라미터를 이용하여 포인트 클라우드로 변환되고, 포인트 클라우드는 Octree 구조의 점유 그리드를 사용하여 3D 맵으로 매핑된다. 제안된 네트워크는 합성 이미지와 Mid-Air 데이터 세트의 깊이 시퀀스를 사용하여 평가된다. 제안하는 네트워크는 이전 연구에 비해 7.69% 더 낮은 오류 값을 보여주었다.

Abstract ▼ AI-Helper

Depth estimation is a key technology in 3D map generation for autonomous driving of vehicles, robots, and drones. The existing sensor-based method has high accuracy but is expensive and has low resolution, while the camera-based method is more affordable with higher resolution. In this study, we propose self-attention-based unsupervised monocular depth estimation for UAV camera system. Self-Attention operation is applied to the network to improve the global feature extraction performance. In addition, we reduce the weight size of the self-attention operation for a low computational amount. The estimated depth and camera pose are transformed into point cloud. The point cloud is mapped into 3D map using the occupancy grid of Octree structure. The proposed network is evaluated using synthesized images and depth sequences from the Mid-Air dataset. Our network demonstrates a 7.69% reduction in error compared to prior studies.

주제어

표/그림 (6)

그림 그림 1. Self-Attention 기반 비지도 단안카메라 깊이 추정 네트워크 Fig. 1. Self-Attention-based Unsupervised Monocular Camera Depth Estimation Network
표 표 1. 기존 연구와 정량적 비교 평가 Table 1. Quantitative comparative evaluation with existing studies
표 표 2. 절제 연구 Table 2. Ablation Study
표 표 3. Self-Attention 레이어 변경에 따른 성능 변화 실험 Table 3. Performance change experiment according to Self-Attention layer
그림 그림 2. 테스트 이미지에 대한 깊이 추정 정성적 비교 결과 Fig. 2. Depth estimation qualitative comparison result for test image
그림 그림 3. 3D 지도 생성 및 경로 추정 결과 Fig. 3. 3D map generation and path estimation results

참고문헌 (20)

C. Godard, O. M. Aodha, amd M. Firman, and G. J.？Brostow, "Digging into self-supervised monocular depth？estimation," in Proc. IEEE/CVF Int. Conf. Comput. Vis.？(ICCV) ,Seoul, pp. 3828-3838, Oct. 2019.
V. Guizilini, R. Ambrus, S. Pillai, A. Raventos, and A.？Gaidon, "3d packing for self-supervised monocular depth？estimation," in Proc. IEEE Conf. Comput. Vis. Pattern？Recognit. (CVPR), pp. 2485-2494, Jun. 2020.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,？A. N. Gomez, and I. Polosukhin, "Attention is all you？need," Advances in neural information processing？systems, Jun. 2017.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn,？X. Zhai, T. Unterthiner, and N. Houlsby, "An image is？worth 16×16 words: Transformers for image recognition at？scale," 2020, arXiv:2010.11929. [Online]. Available:？https://arxiv.org/abs/2010.11929
C. Zhao, Y. Zhang, M. Poggi, F. Tosi, X. Guo, Z. Zhu, and？S. Mattoccia, (2022). "Monovit: Self-supervised？monocular depth estimation with a vision transformer,"？arXiv:2208.03543. [Online]. Available:？https://arxiv.org/abs/2208.03543
M. Fonder, and M. Van Droogenbroeck, "Mid-air: A？multi-modal dataset for extremely low altitude drone？flights," in Proc. IEEE Conf. Comput. Vis. Pattern？Recognit. (CVPR), Long Beach: CA, pp. 0-0, June. 2019.
A. Hornung, K. M. Wurm, M. Bennewitz, C. Stachniss, and？W. Burgard, "OctoMap: An efficient probabilistic 3D？mapping framework based on octrees," Autonomous？robots, 34(3), pp. 189-206, 2013.

상세보기
R. Garg, V. K. Bg, G. Carneiro, and I. Reid, "Unsupervised？cnn for single view depth estimation: Geometry to the？rescue," in: Computer Vision-ECCV 2016: 14th？European Conference, Amsterdam, The Netherlands, Oct,？2016, Proceedings, Part VIII 14. Springer International？Publishing, pp. 740-756, 2016.
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe,？"Unsupervised learning of depth and ego-motion from？video," in Proc. IEEE Conf. Comput. Vis. Pattern？Recognit. (CVPR), Honolulu: HI, pp. 1851-1858, July.？2017.
J. Watson, O. Mac Aodha, V. Prisacariu, G. Brostow, and？M. Firman, "The temporal opportunist: Self-supervised？multi-frame monocular depth," in Proc. IEEE Conf.？Comput. Vis. Pattern Recognit. (CVPR), pp. 1164-1174,？June. 2021.
X. Wang, C. Wang, B. Liu, X. Zhou, L. Zhang, J. Zheng,？and X. Bai, "Multi-view stereo in the deep learning era: A？comprehensive review," Displays, 70, 102102. 2021.

상세보기
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, and B.？Guo, "Swin transformer: Hierarchical vision transformer？using shifted windows," in Proc. IEEE/CVF Int. Conf.？Comput. Vis. (ICCV), pp. 10012-10022, Oct. 2021.
R. Ranftl, A. Bochkovskiy, and V. Koltun, "Vision？transformers for dense prediction," in Proc. IEEE/CVF？Int. Conf. Comput. Vis. (ICCV), pp. 12179-12188, Oct.？2021.
Z. Cheng, Y. Zhang, and C. Tang, "Swin-Depth: Using？Transformers and Multi-Scale Fusion for Monocular-Based？Depth Estimation,". IEEE Sensors Journal, 21 (23),？pp.26912-26920, 2021.

상세보기
S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, "Cbam:？Convolutional block attention module," in Proceedings of？the European conference on computer vision (ECCV),？Munich, Germany, pp. 3-19, Sep. 2018.
Z. Li, Z. Chen, X. Liu, and J. Jiang, "Depthformer:？Exploiting long-range correlation and local information for？accurate monocular depth estimation," arXiv preprint？arXiv: 2203.14211, 2022 [Online]. Available:？https://arxiv.org/abs/2010.11929
A. Agarwal, and C. Arora, "Attention Attention？Everywhere: Monocular Depth Prediction with Skip？Attention," in Proceedings of the IEEE/CVF Winter？Conference on Applications of Computer Vision, pp.？5861-5870. Jan. 2023.
A. Gordon, H. Li, R. Jonschkowski, and A. Angelova,？"Depth from videos in the wild: Unsupervised monocular？depth learning from unknown cameras," in Proc.？IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, pp.？8977-8986, Oct. 2019.
S. J. Hwang, S. J. Park, J. H. Baek, and B. Kim,？"Self-supervised monocular depth estimation using hybrid？transformer encoder," IEEE Sensors Journal, 22(19), pp.？18762-18770. 2022.

상세보기
J. Guo, K. Han, H. Wu, C. Xu, Y. Tang, C. Xu, and Y.？Wang, "Cmt: Convolutional neural networks meet vision？transformers," 2021, arXiv:2107.06263. [Online].？Available: https://arxiv.org/abs/2107.06263？

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증