[논문]Comparison of Reinforcement Learning Activation Functions to Improve the Performance of the Racing Game Learning Agent

Lee, Dongcheul

doi:10.3745/jips.02.0141

Comparison of Reinforcement Learning Activation Functions to Improve the Performance of the Racing Game Learning Agent 원문보기

Journal of information processing systems, v.16 no.5, 2020년, pp.1074 - 1082

Lee, Dongcheul (Dept. of Multimedia Engineering, Hannam University)

Abstract ▼ AI-Helper

Recently, research has been actively conducted to create artificial intelligence agents that learn games through reinforcement learning. There are several factors that determine performance when the agent learns a game, but using any of the activation functions is also an important factor. This paper compares and evaluates which activation function gets the best results if the agent learns the game through reinforcement learning in the 2D racing game environment. We built the agent using a reinforcement learning algorithm and a neural network. We evaluated the activation functions in the network by switching them together. We measured the reward, the output of the advantage function, and the output of the loss function while training and testing. As a result of performance evaluation, we found out the best activation function for the agent to learn the game. The difference between the best and the worst was 35.4%.

주제어

표/그림 (7)

그림 Fig. 1. The plot of activation functions used in the RL agent.
표 Table 1. Activation functions used in the RL agent
그림 Fig. 2. Illustration of the neural network used by the RL agent to learn how to play a 2D racing game.
표 Table 2. Hyperparameters in the agent
그림 Fig. 3. Performance metrics for each activation function along with timesteps during training. (a) The reward. (b) The output of the advantage function A^π(s_t,a_t). (c) The output of the loss function L^π(s_t).
그림 Fig. 4. The violin plot of performance metrics for each activation function during training: (a) the reward, (b) the output of the advantage function A^π(s_t, a_t), and (c) the output of the loss function L^π (s_t).
표 Table 3. Mean and maximum reward for each activation function during the testing

AI 본문요약
AI-Helper

* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.

제안 방법

However, there has been no activation function which generally shows the best result in every circumstance. In this paper, we build an RL agent to learn a 2D racing game using the ACER algorithm. The network composed of CNN and LSTM.

대상 데이터

After 1×107 timesteps’ training, 100 episodes of the game were played during the testing.
The game consists of maneuvering a race car in a long-distance endurance race. The object of the game is to pass 200 cars each day. The driver should avoid other cars, otherwise, the driver’s car stops.

참고문헌 (12)

A. Jeerige, D. Bein, and A. Verma, "Comparison of deep reinforcement learning approaches for intelligent game playing," in Proceedings of 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, 2019, pp. 366-371.
M. N. Moghadasi, A. T. Haghighat, and S. S. Ghidary, "Evaluating Markov decision process as a model for decision making under uncertainty environment," in Proceedings of 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 2007, pp. 2446-2450.
D. Lee and B. Park, "Comparison of deep learning activation functions for performance improvement of a 2D shooting game learning agent," The Journal of the Institute of Internet, Broadcasting and Communication, vol. 19, no. 2, pp. 135-141, 2019.

원문보기 상세보기
R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, "Convolutional neural networks: an overview and application in radiology," Insights into Imaging, vol. 9, no. 4, pp. 611-629, 2018.

상세보기
D. W. Lu, "Agent inspired trading using recurrent reinforcement learning and LSTM neural networks," 2017 [Online]. Available: https://arxiv.org/abs/1707.07338.
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI gym," 2016 [Online]. Available: https://arxiv.org/abs/1606.01540.
L. Lu, Y. Shin, Y. Su, and G. E. Karniadakis, "Dying ReLU and initialization: theory and numerical examples," 2019 [Online]. Available: https://arxiv.org/abs/1903.06733.
X. Zhang, Y. Zou, and W. Shi, "Dilated convolution neural network with LeakyReLU for environmental sound classification," in Proceedings of 2017 22nd International Conference on Digital Signal Processing (DSP), London, UK, 2017, pp. 1-5.
A. Shah, E. Kadam, H. Shah, S. Shinde, and S. Shingade, "Deep residual networks with exponential linear unit," in Proceedings of the 3rd International Symposium on Computer Vision and the Internet, Jaipur, India, 2016, pp. 59-65.
Z. Huang, T. Ng, L. Liu, H. Mason, X. Zhuang, and D. Liu, "SNDCNN: self-normalizing deep CNNs with scaled exponential linear units for speech recognition," 2019 [Online]. Available: https://arxiv.org/abs/1910.01992.
G. C. Tripathi, M. Rawat, and K. Rawat, "Swish activation based deep neural network predistorter for RF-PA," in Proceedings of 2019 IEEE Region 10 Conference (TENCON), Kochi, India, 2019, pp. 1239-1242.
Z. Wang and X. Xu, "Efficient deep convolutional neural networks using CReLU for ATR with limited SAR images," The Journal of Engineering, vol. 2019, no. 21, pp. 7615-7618, 2019.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증