[논문]C-COMA: 동적 다중 에이전트 환경을 위한 지속적인 강화 학습 모델

정규열; 김인철

doi:10.3745/ktsde.2021.10.4.143

초록
AI-Helper

다양한 실세계 응용 분야들에서 공동의 목표를 위해 여러 에이전트들이 상호 유기적으로 협력할 수 있는 행동 정책을 배우는 것은 매우 중요하다. 이러한 다중 에이전트 강화 학습(MARL) 환경에서 기존의 연구들은 대부분 중앙-집중형 훈련과 분산형 실행(CTDE) 방식을 사실상 표준 프레임워크로 채택해왔다. 하지만 이러한 다중 에이전트 강화 학습 방식은 훈련 시간 동안에는 경험하지 못한 새로운 환경 변화가 실전 상황에서 끊임없이 발생할 수 있는 동적 환경에서는 효과적으로 대처하기 어렵다. 이러한 동적 환경에 효과적으로 대응하기 위해, 본 논문에서는 새로운 다중 에이전트 강화 학습 체계인 C-COMA를 제안한다. C-COMA는 에이전트들의 훈련 시간과 실행 시간을 따로 나누지 않고, 처음부터 실전 상황을 가정하고 지속적으로 에이전트들의 협력적 행동 정책을 학습해나가는 지속 학습 모델이다. 본 논문에서는 대표적인 실시간 전략게임인 StarcraftII를 토대로 동적 미니게임을 구현하고 이 환경을 이용한 다양한 실험들을 수행함으로써, 제안 모델인 C-COMA의 효과와 우수성을 입증한다.

Abstract ▼ AI-Helper

It is very important to learn behavioral policies that allow multiple agents to work together organically for common goals in various real-world applications. In this multi-agent reinforcement learning (MARL) environment, most existing studies have adopted centralized training with decentralized exe...

It is very important to learn behavioral policies that allow multiple agents to work together organically for common goals in various real-world applications. In this multi-agent reinforcement learning (MARL) environment, most existing studies have adopted centralized training with decentralized execution (CTDE) methods as in effect standard frameworks. However, this multi-agent reinforcement learning method is difficult to effectively cope with in a dynamic environment in which new environmental changes that are not experienced during training time may constantly occur in real life situations. In order to effectively cope with this dynamic environment, this paper proposes a novel multi-agent reinforcement learning system, C-COMA. C-COMA is a continual learning model that assumes actual situations from the beginning and continuously learns the cooperative behavior policies of agents without dividing the training time and execution time of the agents separately. In this paper, we demonstrate the effectiveness and excellence of the proposed model C-COMA by implementing a dynamic mini-game based on Starcraft II, a representative real-time strategy game, and conducting various experiments using this environment.

주제어

표/그림 (7)

그림 Fig. 1. Dynamic Changes in a StarcraftⅡ Mini Game
그림 Fig. 2. Architecture of C-COMA
그림 Fig. 3. Comparison with Non-continual Learning models: Win Rate
그림 Fig. 4. Comparison with Continual Learning Models: Win Rate
그림 Fig. 5. Comparison with Continual Learning Models: Mean Return
그림 Fig. 6. Joint Action When Enemies are Out of Sight
그림 Fig. 7. Joint Action Learned by C-COMA When All Enemies are Distant

참고문헌 (16)

M. Samvelyan, T. Rashid, C. S. Witt, G. Farquhar, N. Nardelli, T. G. J. Rudner, C. M. Hung, P. H. S. Torr, J. N. Foerster, and S. Whiteson, "The StarCraft Multi-Agent Challenge," CoRR, abs/1902.04043, 2019.
J. N. Foerster, G, Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, "Counterfactual multi-agent policy gradients," in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, "Value-decomposition networks for cooperative multi-agent learning based on team reward," in Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017.
T. Rashid, M. Samvelyan, C. S. Witt, G. Farquhar, J. N. Foerster, and S. Whiteson, "Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning," in Proceedings of the International Conference on Machine Learning (ICML), pp.4292-4301, 2018.
M. Tan, "Multi-agent reinforcement learning: Independent vs. cooperative agents." in Proceedings of the Tenth International Conference on Machine Learning (ICML), pp.330-337, 1993.
C. Watkins, "Learning from delayed rewards," Ph.D. Thesis, University of Cambridge England, 1989.
V. Mnih, et al., "Human-level control through deep reinforcement learning," Nature, pp.529-533, 2015.
A. Tampuu, et al., "Multiagent cooperation and competition with deep reinforcement learning," PLoS ONE, Vol.12, No.4, 2017.

상세보기
J. N. Foerster, et al., "Stabilising experience replay for deep multi-agent reinforcement learning," in Proceedings of The 34th International Conference on Machine Learning (ICML), pp.1146-1155, 2017
C. Guestrin, D. Koller, and R. Parr, "Multiagent planning with factored MDPs," In Advances in Neural Information Processing Systems (NIPS), MIT Press, pp.1523-1530, 2002.
J. R. Kok and N. Vlassis, "Collaborative multiagent reinforcement learning by payoff propagation," Journal of Machine Learning Research, pp.1789-1828, 2006.
S. Sukhbaatar, R. Fergus, A. Szlam, and R. Fergus, "Learning multiagent communication with backpropagation," In Advances in Neural Information Processing Systems (NIPS), pp.2244-2252, 2016.
P. Peng, et al., "Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play StarCraft combat games," In Advances in Neural Information Processing Systems (NIPS), 2017.
J. K. Gupta, M. Egorov, and M. Kochenderfer, "Cooperative multi-agent control using deep reinforcement learning," in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Springer, pp.66-83, 2017.
R. Lowe, Y. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, "Multi-agent actor-critic for mixed cooperative-competitive environments," In Advances in Neural Information Processing Systems (NIPS), pp.6382-6393, 2017.
S. Iqbal, C. A, C. S. Witt, B. Penget, W. Bohmer, S. Whiteson, and F. Sha, "AI-QMIX: Attention and imagination for dynamic multi-agent reinforcement learning," arXiv: 2006.04222, 2020.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

C-COMA: 동적 다중 에이전트 환경을 위한 지속적인 강화 학습 모델
C-COMA: A Continual Reinforcement Learning Model for Dynamic Multiagent Environments 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (7)

표/그림 (7)

참고문헌 (16)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

C-COMA: 동적 다중 에이전트 환경을 위한 지속적인 강화 학습 모델 C-COMA: A Continual Reinforcement Learning Model for Dynamic Multiagent Environments 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (7) 모든 표/그림 보기

표/그림 (7) 슬라이드로 보기

참고문헌 (16)

이 논문을 인용한 문헌

저자의 다른 논문 :

김인철 (75)

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

C-COMA: 동적 다중 에이전트 환경을 위한 지속적인 강화 학습 모델
C-COMA: A Continual Reinforcement Learning Model for Dynamic Multiagent Environments 원문보기

초록
AI-Helper

표/그림 (7)

표/그림 (7)