[논문]에이전트 학습 속도 향상을 위한 Q-Learning 정책 설계

용성중; 박효경; 유연휘; 문일영

doi:10.14702/jpee.2022.219

에이전트 학습 속도 향상을 위한 Q-Learning 정책 설계
Q-Learning Policy Design to Speed Up Agent Training 원문보기

JPEE : Journal of practical engineering education = 실천공학교육논문지, v.14 no.1, 2022년, pp.219 - 224

용성중 (한국기술교육대학교 컴퓨터공학과) , 박효경 (한국기술교육대학교 컴퓨터공학과) , 유연휘 (한국기술교육대학교 컴퓨터공학과) , 문일영 (한국기술교육대학교 컴퓨터공학과)

초록
AI-Helper

강화학습의 기본적인 알고리즘으로 많이 사용되고 있는 Q-Learning은 현재 상태에서 취할 수 있는 행동의 보상 중 가장 큰 값을 선택하는 Greedy action을 통해 보상을 최대화하는 방향으로 에이전트를 학습시키는 기법이다. 본 논문에서는 Frozen Lake 8*8 그리드 환경에서 Q-Learning을 사용하여 에이전트의 학습 속도를 높일 수 있는 정책에 관하여 연구하였다. 또한, Q-learning 의 기존 알고리즘과 에이전트의 행동에 '방향성'이라는 속성을 부여한 알고리즘의 학습 결과 비교를 진행하였다. 결과적으로, 본 논문에서 제안한 Q-Learning 정책이 통상적인 알고리즘보다 정확도와 학습 속도 모두 크게 높일 수 있는 것을 분석되었다.

Abstract ▼ AI-Helper

Q-Learning is a technique widely used as a basic algorithm for reinforcement learning. Q-Learning trains the agent in the direction of maximizing the reward through the greedy action that selects the largest value among the rewards of the actions that can be taken in the current state. In this paper, we studied a policy that can speed up agent training using Q-Learning in Frozen Lake 8×8 grid environment. In addition, the training results of the existing algorithm of Q-learning and the algorithm that gave the attribute 'direction' to agent movement were compared. As a result, it was analyzed that the Q-Learning policy proposed in this paper can significantly increase both the accuracy and training speed compared to the general algorithm.

주제어

표/그림 (6)

그림 그림 1. 강화학습 프레임워크 Fig. 1. Reinforcement Learning Framework.
그림 그림 2. Frozen Lake 시뮬레이션 환경 Fig. 2. Frozen lake Simulation Environment.
그림 그림 3. Q-Learning 장애물 보상 시뮬레이션 결과 Fig. 3. Q-Learning Obstacle Compensation Simulation Result.
그림 그림 4. Q-Learning 장애물 보상 시뮬레이션 학습 성공률 Fig. 4. Q-Learning Obstacle Compensation Simulation Learning Success Rate.
그림 그림 5. 처벌강화 Q-Learning 시뮬레이션 결과 Fig. 5. Strengthening Punishment Q-Learning Simulation Result.
그림 그림 6. 처벌강화 Q-Learning 시뮬레이션 학습 성공률 Fig. 6. Strengthening punishment Q-Learning Simulation Learning Success Rate.

참고문헌 (6)

X. Wang, L. Jin, and H. Wei, "The shortest path planning based on reinforcement learning," Journal of Physics: Conference Series, vol. 1584, 012006, 2020.

상세보기
R. S. Sutton and A. G. Barto, "Reinforcement learning: an introduction," MIT Press Cambridge, vol. 135, 1998.
C. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, pp. 279-292, May 1992.

상세보기
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing atari with deep reinforcement learning," Proceeding of the 2013 Conference on Neural Information Processing Systems Deep Learning Workshop, California: USA, 2013.
J. Clifton and E. Laber, "Q-learning: theory and applications", Annual Review of Statistics and Its Application, vol. 7, pp. 279-301, 2020.

상세보기
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, "OpenAI Gym," Jun. 2016, arXiv [Online]. Available: https://arxiv.org/ abs/1606.01540v1.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증