[논문]강화학습 에이전트 시야 정보 차이에 의한 학습 성능 비교

김찬섭; 장시환; 양성일; 강신진

doi:10.7583/jkgs.2021.21.5.17

[국내논문] 강화학습 에이전트 시야 정보 차이에 의한 학습 성능 비교
Comparison of Learning Performance by Reinforcement Learning Agent Visibility Information Difference 원문보기

한국게임학회 논문지 = Journal of Korea Game Society, v.21 no.5, 2021년, pp.17 - 28

김찬섭 (홍익대학교 게임학부) , 장시환 (한국전자통신연구원 콘텐츠연구본부) , 양성일 (한국전자통신연구원 콘텐츠연구본부) , 강신진 (홍익대학교 게임학부)

초록
AI-Helper

인공지능 스스로가 자신을 발전시켜 최적의 문제 해결 방법을 찾는 강화학습은 여러 분야에서 활용 가치가 높은 기술이다. 특히 게임 분야는 강화학습 인공지능에 문제 해결을 위한 가상환경을 제공할 수 있다는 장점이 있으며 강화학습 에이전트는 주어진 환경에 대한 정보인 관측변수를 사용하여 자신의 상황과 환경에 대한 정보를 파악하여 환경에 대한 문제를 해결한다. 본 실험에서는 롤플레잉 게임의 인스턴트 던전 환경을 간략화하여 제작하고 에이전트에게 관측변수 중 시야에 관련된 관측변수를 다양하게 설정하였다. 실험 결과 각 설정된 변수들이 학습속도에 얼마나 영향을 주는지를 파악할 수 있었고, 이러한 결과는 롤플레잉 게임 강화학습 연구에 참고할 수 있다.

Abstract ▼ AI-Helper

Reinforcement learning, in which artificial intelligence develops itself to find the best solution to problems, is a technology that is highly valuable in many fields. In particular, the game field has the advantage of providing a virtual environment for problem-solving to reinforcement learning artificial intelligence, and reinforcement learning agents solve problems about their environment by identifying information about their situation and environment using observations. In this experiment, the instant dungeon environment of the RPG game was simplified and produced and various observation variables related to the field of view were set to the agent. As a result of the experiment, it was possible to figure out how much each set variable affects the learning speed, and these results can be referred to in the study of game RPG reinforcement learning.

주제어

표/그림 (23)

그림 [Fig. 1] Reinforcement Learning Diagram
그림 [Fig. 2] Interaction Flow between Stable Baseline3 and Unity
그림 [Fig. 3] Reward Value Comparison Graph for A2C(a) and PPO(b)
그림 [Fig. 4] Episode Length Graph Comparison for A2C(a) and PPO(b)
그림 [Fig. 5] Training Environment Screen
표 [Table 1] Hyperparameters in Reinforcement Learning
그림 [Fig. 6] Visualization of Control Group for Range of View (Top Left (a), Top Right (b), Bottom Left (c), Bottom Right (d))
표 [Table 2] Type of Observation Variable
표 [Table 3] Control Group for Range of View
그림 [Fig. 7] Visualization of Control group for Number of Ray (Top Left (a), Top Right (e), Bottom Left (f), Bottom Right (g))
그림 [Fig. 8] Visualization of Control Group for Distance of View (Top Left (a), Top Right (h), Bottom Left (i), Bottom Right (j))
표 [Table 4] Control Group for Ray
표 [Table 5] Control Group for Distance of View
그림 [Fig. 9] Visualization of Eight Distributed Training Environment
그림 [Fig. 10] Process Screen for Agent Training
표 [Table 6] Design of Reward Functions
그림 [Fig. 11] Reward Value Graph for Range of View
그림 [Fig. 12] Episode Length Graph for Range of View
그림 [Fig. 13] Reward Value Graph for Number of Ray
그림 [Fig. 14] Episode Length Graph for Number of Ray
그림 [Fig. 15] Reward Value Graph for Distance of View
그림 [Fig. 16] Episode Length Graph for Distance of View
표 [Table 7] Experimental Results for Visibility Information Difference

참고문헌 (15)

Marzian, F., & Qamal, M. (2017). Game RPG "The Royal Sword" Berbasis Desktop Dengan Menggunakan Metode Finite State Machine (FSM). Jurnal Sistem Informasi, 1(2).
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., ... & Hassabis, D. (2017). Mastering the game of go without human knowledge. nature, 550(7676), 354-359.

상세보기
Teahoon Kim, "Implementing Cookie Run AI that is better than me with deep learning and reinforcement learning", slideshare, last modified Oct 25, 2016, accessed May 24, 2021, https://www.slideshare.net/carpedm20/ai-67616630.
Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., ... & Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350-354.

상세보기
Sangbin Moon, "Generation of progamer level Bimu AI using reinforcement learning in Blade and Soul", NDC, last modified Jul 24, 2019, accessed May 26, 2021,
Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009, June). Curriculum learning. In Proceedings of the 26th annual international conference on machine learning (pp. 41-48).
Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057-1063).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Soo Yeong Jang, et al. Deep reinforcement learning technology trends, ETRI Electronics and Telecommunications Trends, 34.4 (2019):1-14.
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015, June). Trust region policy optimization. In International conference on machine learning (pp. 1889-1897). PMLR.
Pytorch Library, https://pytorch.org/
Stable Baselines 3, https://github.com/DLR-RM/stable-baselines3
Unity Engine, https://www.unity.com/
ZeroMQ library, https://zeromq.org/
Tensorboard, https://www.tensorflow.org/tensorboard

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증