[논문]멀티 에이전트 강화학습 기술 동향

유병현; 데브라니 데비; 김현우; 송화전; 박경문; 이성원

doi:10.22648/etri.2020.j.350614

멀티 에이전트 강화학습 기술 동향
A Survey on Recent Advances in Multi-Agent Reinforcement Learning 원문보기

전자통신동향분석 = Electronics and telecommunications trends, v.35 no.6, 2020년, pp.137 - 149

유병현 (복합지능연구실) , 데브라니 데비 (정보전략부) , 김현우 (복합지능연구실) , 송화전 (복합지능연구실) , 박경문 (복합지능연구실) , 이성원 (정보전략부)

Abstract ▼ AI-Helper

Several multi-agent reinforcement learning (MARL) algorithms have achieved overwhelming results in recent years. They have demonstrated their potential in solving complex problems in the field of real-time strategy online games, robotics, and autonomous vehicles. However these algorithms face many challenges when dealing with massive problem spaces in sparse reward environments. Based on the centralized training and decentralized execution (CTDE) architecture, the MARL algorithms discussed in the literature aim to solve the current challenges by formulating novel concepts of inter-agent modeling, credit assignment, multiagent communication, and the exploration-exploitation dilemma. The fundamental objective of this paper is to deliver a comprehensive survey of existing MARL algorithms based on the problem statements rather than on the technologies. We also discuss several experimental frameworks to provide insight into the use of these algorithms and to motivate some promising directions for future research.

주제어

표/그림 (4)

그림 그림 1 Message Pruning 시스템 구조[20]
그림 그림 2 SchedNet 시스템 구조[21]
그림 그림 3 Multi-agent particle environment에서의 멀티 에이전트 시나리오 예시[7] (a) Cooperative communication, (b) Cooperative navigation, (c) Predator-prey, (d) Physical deception
그림 그림 4 StarCraft multi-agent challenge 캡처 화면[31]

참고문헌 (31)

V. Mnih et al., "Playing atari with deep reinforcement learning," arXiv preprint, CoRR, 2013, arXiv: 1312.5602.
J. Schulman et al., "Trust Region Policy Optimization," in Proc. Int Conf. Mach. Learn. (Lille, France), Feb. 2015, pp. 1889-1897.
J. Schulman et al., "Proximal policy optimization algorithms," arXiv preprint, CoRR, 2017, arXiv: 1707.06347.
T. P. Lillicrap et al., "Continuous control with deep reinforcement learning," in Int. Conf. Learn. Representations, 2016.
K. Zhang, Z. Yang, and T. Basar, "Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms," arXiv preprint, CoRR, 2019, arXiv: 1911.10635v1.
O. Jadid and D. Hajinezhad, "A review of cooperative multiagent deep reinforcement learning," arXiv preprint, CoRR, 2019, arXiv: 1908.03963v3.
R. Lowe et al., "Multi-agent actor-critic for mixed cooperativecompetitive environments," in Advances in Neural Information Processing Systems, 2017, pp. 6379-6390.
Y. Yang et al., "Mean field multi-agent reinforcement learning," in Proc. Int. conf. Mach. Learn. (Stockholm, Sweden), 2018.
S. Iqbal and F. Sha, "Actor-attention-critic for multi-agent reinforcement learning," in Proc. Int. Conf. Mach. Lear. (Long Beach, CA, USA), 2019, pp. 2961-2970.
T. Haarnoja et al., "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," in Prco. Int. Conf. Mach. Learn. (Stockholm, Sweden), 2018, pp. 1861-1870.
H. Ryu, H. Shin, and J. Park, "Multi-agent actor-critic with hierarchical graph attention network," in Proc. AAAI Conf. Artif. Intell. (New York, USA), 2020, pp. 7236-7243.
J. Foerster et al., "Counterfactual multi-agent policy gradients," in Proc. AAAI Conf. Artif. Intell. 2020.
P. Sunehag et al., "Value-decomposition networks for cooperative multi-agent learning based on team reward," in Proc. Int. Conf. Auto. Agent. Multi. Syst. 2018, pp. 2085-2087.
T. Rashid et al., "QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning," in Proc. Int. Conf. Mach. Learn. 2018.
K. Son et al., "Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning," in Proc. Int. Conf. Mach. Learn. 2019.
Y. Du et al., "LIIR: Learning Individual Intrinsic Reward in MultiAgent Reinforcement Learning," in Proc. Adv. Neural Inform. Process. Syst. 2019, pp. 4403-4414.
C. V. Goldman and S. Zilberstein, "Decentralized control of cooperative systems: Categorization and complexity analysis," J. Artif. Intelli. Res. vol. 22, 2004, pp. 143-174.

상세보기
E. Pesce and G. Montana, "Improving coordination in smallscale multi-agent deep reinforcement learning through memory-driven communication," Mach. Learn. vol. 109, 2020, doi: 10.1007/s10994-019-05864-5.

상세보기
S. Q. Zhang, Q. Zhang, and J. Lin, "Efficient communication in multi-agent reinforcement learning via variance based control," in Adv. Neural Inform. Process. Syst. 2019, pp. 3235-3244.
H. Mao et al., "Learning agent communication under limited bandwidth by message rruning," arXiv preprint, CoRR, Dec. 2019, Accessed: Sep. 21, 2020. [Online]. Available: http://arxiv.org/abs/1912.05304.
D. Kim et al., "Learning to schedule communication in multiagent reinforcement learning," arXiv preprint, CoRR, Feb. 2019, Accessed: Sep. 10, 2020. [Online]. Available: http://arxiv.org/abs/1902.01554.
J. Foerster et al., "Learning to communicate with deep multiagent reinforcement learning," in Adv. Neural Inform. Process. Syst. 2016, pp. 2137-2145.
N. Jaques et al., "Social influence as intrinsic motivation for multi-agent deep reinforcement learning," in Proc. Int. Conf. Mach. Learn. 2019, pp. 3040-3049.
K. Cao et al., "Emergent communication through negotiation," arXiv preprint, CoRR, Apr. 2018, Accessed: Sep. 09, 2020. [Online]. Available: http://arxiv.org/abs/1804.03980.
T. Eccles et al., "Biases for emergent communication in multiagent reinforcement learning," in Adv. Neural Inform. Process. Syst. 2019, pp. 13111-13121.
S. Gupta, R. Hazra, and A. Dukkipati, "Networked multi-agent reinforcement learning with emergent communication," In Proc. Int. Conf. Auton. Agents and Multiagent Syst. (Auckland, New Zealand), May 2020.
T. Wang et al., "Influence-based multi-agent exploration," in Proc. Int. Conf. Learn. Representations, 2020.
G. Chen, "A new framework for multi-agent reinforcement learning-centralized training and exploration with decentralized execution via policy distillation," in Proc. Int. Conf. Auton. Agents Multiagent Sys. 2019.
A. Mahajan et al., "Maven: Multi-agent variational exploration," in Adv. Neural Inform. Process. Syst. 2019, pp. 7613-7624.
G. Brockman et al., "Openai gym," arXiv preprint, CoRR, arXiv: 1606.01540.
M. Samvelyan et al., "The starcraft multi-agent challenge," arXiv preprint, CoRR, 2019, arXiv: 1902.04043.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

멀티 에이전트 강화학습 기술 동향
A Survey on Recent Advances in Multi-Agent Reinforcement Learning 원문보기

Abstract ▼ AI-Helper

주제어

표/그림 (4)

표/그림 (4)

참고문헌 (31)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

멀티 에이전트 강화학습 기술 동향 A Survey on Recent Advances in Multi-Agent Reinforcement Learning 원문보기

Abstract ▼ AI-Helper

주제어

표/그림 (4) 모든 표/그림 보기

표/그림 (4) 슬라이드로 보기

참고문헌 (31)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

멀티 에이전트 강화학습 기술 동향
A Survey on Recent Advances in Multi-Agent Reinforcement Learning 원문보기

표/그림 (4)

표/그림 (4)