[논문]멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동

최정현; 김인철

doi:10.3745/ktsde.2023.12.9.407

초록
AI-Helper

MultiOn(Multi-Object Goal Visual Navigation)은 에이전트가 미지의 실내 환경 내 임의의 위치에 놓인 다수의 목표 물체들을 미리 정해준 일정한 순서에 따라 찾아가야 하는 매우 어려운 시각적 탐색 이동 작업이다. MultiOn 작업을 위한 기존의 모델들은 행동 선택을 위해 시각적 외관 지도나 목표 지도와 같은 단일 맥락 지도만을 이용할 뿐, 다양한 멀티모달 맥락정보에 관한 종합적인 관점을 활용할 수 없다는 한계성을 가지고 있다. 이와 같은 한계성을 극복하기 위해, 본 논문에서는 MultiOn 작업을 위한 새로운 심층 신경망 기반의 에이전트 모델인 MCFMO(Multimodal Context Fusion for MultiOn tasks)를 제안한다. 제안 모델에서는 입력 영상의 시각적 외관 특징외에 환경 물체의 의미적 특징, 목표 물체 특징도 함께 포함한 멀티모달 맥락 지도를 행동 선택에 이용한다. 또한, 제안 모델은 점-단위 합성곱 신경망 모듈을 이용하여 3가지 서로 이질적인 맥락 특징들을 효과적으로 융합한다. 이 밖에도 제안 모델은 효율적인 이동 정책 학습을 유도하기 위해, 목표 물체의 관측 여부와 방향, 그리고 거리를 예측하는 보조 작업 학습 모듈을 추가로 채용한다. 본 논문에서는 Habitat-Matterport3D 시뮬레이션 환경과 장면 데이터 집합을 이용한 다양한 정량 및 정성 실험들을 통해, 제안 모델의 우수성을 확인하였다.

Abstract ▼ AI-Helper

The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimo...

The Multi-Object Goal Visual Navigation(MultiOn) is a visual navigation task in which an agent must visit to multiple object goals in an unknown indoor environment in a given order. Existing models for the MultiOn task suffer from the limitation that they cannot utilize an integrated view of multimodal context because use only a unimodal context map. To overcome this limitation, in this paper, we propose a novel deep neural network-based agent model for MultiOn task. The proposed model, MCFMO, uses a multimodal context map, containing visual appearance features, semantic features of environmental objects, and goal object features. Moreover, the proposed model effectively fuses these three heterogeneous features into a global multimodal context map by using a point-wise convolutional neural network module. Lastly, the proposed model adopts an auxiliary task learning module to predict the observation status, goal direction and the goal distance, which can guide to learn the navigational policy efficiently. Conducting various quantitative and qualitative experiments using the Habitat-Matterport3D simulation environment and scene dataset, we demonstrate the superiority of the proposed model.

주제어

표/그림 (11)

그림 Fig. 1. An Example of Multi-Object Goal Visual Navigation
그림 Fig. 2. Architecture of the Proposed MCFMO Model
그림 Fig. 3. Local Context Embedding
그림 Fig. 4. Global Context Embedding
표 Table 1. Comparison with Different Spatial Context Features
표 Table 2. Comparison with Different Fusion Methods
표 Table 3. Comparison with Different Auxiliary Task Losses
표 Table 4. Comparison with Other Multi-Object Goal Visual Navigation(MultiOn) Models
그림 Fig. 5. Qualitative Evaluation of the Proposed Model: Case 1
그림 Fig. 6. Qualitative Evaluation of the Proposed Model: Case 2
그림 Fig. 7. Qualitative Evaluation of the Proposed Model: Case 3

참고문헌 (16)

S. Wani, S. Patel, U. Jain, A. Chang, and M. Savva, "Multion:？Benchmarking semantic map memory using multi-object？navigation," Advances in Neural Information Processing？Systems(NeurIPS), Vol.33, pp.9700-9712, 2020.？
P. Marza, L. Matignon, O. Simonin, and C. Wolf, "Teaching？agents how to map: Spatial reasoning for multi-object？navigation," in Proceedings of the IEEE/RSJ International？Conference on Intelligent Robots and Systems (IROS),？Kyoto, pp.1725-1732, 2022.？
S. Raychaudhuri, T. Campari, U. Jain, M. Savva, and A.？X. Chang, "Reduce, reuse, recycle: Modular multi-object？navigation," arXiv preprint arXiv:2304.03696, 2023.？
J. Kim, E. S. Lee, M. Lee, D. Zhang, and Y. M. Kim, "Sgolam:？Simultaneous goal localization and mapping for multiobject goal navigation," arXiv preprint arXiv:2110.07171,？2021.？
P. Chen, D. Ji, K. Lin, W. Hu, W. Huang, T. Li, M. Tan？and C. Gan, "Learning active camera for multi-object？navigation," Advances in Neural Information Processing？Systems(NeurIPS), Vol.35, pp.28670-28682, 2022.？
N. Savinov, A. Dosovitskiy, and V. Koltun, "Semi-parametric？topological memory for navigation," in Proceedings of the？International Conference on Learning Representations？(ICLR), Vancouver, 2018.？
K. Chen, J. K. Chen, J. Chuang, M. Vazquez, and S.？Savarese, "Topological planning with transformers for？vision-and-language navigation," in Proceedings of the？IEEE/CVF Conference on Computer Vision and Pattern？Recognition(CVPR), Nashville, pp.11276-11286, 2021.？
D. S. Chaplot, R. Salakhutdinov, A. Gupta, and S. Gupta,？"Neural topological slam for visual navigation," in Proceedings of the IEEE/CVF Conference on Computer Vision？and Pattern Recognition(CVPR), Seattle, pp.12875-12884,？2020.？
N. Kim, O. Kwon, H. Yoo, Y. Choi, J. Park, and S. Oh,？"Topological semantic graph memory for image-goal？navigation," in Proceedings of the 6th Conference on？Robot Learning (PMLR), Auckland, pp.393-402, 2023.？
S. Gupta, J. Davidson, S. Levine, R. Sukthankar, and J.？Malik, "Cognitive mapping and planning for visual navigation," in Proceedings of the IEEE/CVF International？Conference on Computer Vision(ICCV), Venice, 2017,？pp.2616-2625.？
D. S. Chaplot, D. P. Gandhi, A. Gupta, and R. R.？Salakhutdinov, "Object goal navigation using goal-oriented？semantic exploration," Advances in Neural Information？Processing Systems(NeurIPS), Vol.33, pp.4247-4258, 2020.？
P. Chen, D. Ji, K. Lin, R. Zeng, T. Li, M. Tan, and C. Gan,？"Weakly-supervised multi-granularity map learning for？vision-and-language navigation," Advances in Neural Information Processing Systems(NeurIPS), Vol.35, pp.38149-38161, 2022.？
S. K. Ramakrishnan, D. S. Chaplot, Z. Al-Halah, J. Malik,？and K. Grauman, "Poni: Potential functions for objectgoal？navigation with interaction-free learning," in Proceedings？of the IEEE/CVF Conference on Computer Vision and？Pattern Recognition(CVPR), New Orleans, pp.18890-18900, 2022.？
K. Fang, A. Toshev, L. Fei-Fei, and S. Savarese, "Scene？memory transformer for embodied agents in long-horizon？tasks," in Proceedings of the IEEE/CVF Conference on？Computer Vision and Pattern Recognition(CVPR). Long？Beach, pp.538-547,？2019.
B. Mayo, T. Hazan, and A. Tal, "Visual navigation with？spatial attention," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition？(CVPR), Nashville, pp.16898-16907, 2021.？
A. Mousavian, A. Toshev, M. Fiser, J. Kosecka, A. Wahid,？and J. Davidson, "Visual representations for semantic？target driven navigation," in Proceedings of International？Conference on Robotics and Automation (ICRA), Montreal,？pp.8846-8852, 2019.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동
Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (11)

표/그림 (11)

참고문헌 (16)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동 Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (11) 모든 표/그림 보기

표/그림 (11) 슬라이드로 보기

참고문헌 (16)

이 논문을 인용한 문헌

저자의 다른 논문 :

김인철 (75)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

멀티모달 맥락정보 융합에 기초한 다중 물체 목표 시각적 탐색 이동
Multi-Object Goal Visual Navigation Based on Multimodal Context Fusion 원문보기

초록
AI-Helper

표/그림 (11)

표/그림 (11)