[논문]가상환경과 DDPG 알고리즘을 이용한 자율 비행체의 소노부이 최적 배치 연구

김종인; 한민석

doi:10.17661/jkiiect.2022.15.2.152

초록
AI-Helper

본 논문에서는 대잠전의 필수 요소인 소노부이를 무인항공기가 최적의 배치로 투하할 수 있게 하는 방법을 제시한다. 이를 위해 Unity 게임엔진을 통해 음향 탐지 성능 분포도를 모사한 환경을 구성하고 Unity ML-Agents를 활용해 직접 구성한 환경과 외부에서 Python으로 작성한 강화학습 알고리즘이 서로 통신을 주고받으며 학습할 수 있게 하였다. 특히, 잘못된 행동이 누적되어 학습에 영향을 미치는 경우를 방지하고 비행체가 목표지점으로 최단 시간에 비행함과 동시에 소노부이가 최대 탐지 영역을 확보하기 위해 강화학습을 도입하고. 심층 확정적 정책 그래디언트(Deep Deterministic Policy Gradient: DDPG) 알고리즘을 적용하여 소노부이의 최적 배치를 달성하였다. 학습 결과 에이전트가 해역을 비행하며 70개의 타겟 후보들 중 최적 배치를 달성하기 위한 지점들만을 통과하였고 탐지 영역을 확보한 모습을 보면 겹치는 영역 없이 최단 거리에 있는 지점을 따라 비행하였음을 알 수 있다. 이는 최적 배치의 요건인 최단 시간, 최대 탐지 영역으로 소노부이를 배치하는 자율 비행체를 구현하였음을 의미한다.

Abstract ▼ AI-Helper

In this paper, we present a method to enable an unmanned aerial vehicle to drop the sonobuoy, an essential element of anti-submarine warfare, in an optimal deployment. To this end, an environment simulating the distribution of sound detection performance was configured through the Unity game engine,...

In this paper, we present a method to enable an unmanned aerial vehicle to drop the sonobuoy, an essential element of anti-submarine warfare, in an optimal deployment. To this end, an environment simulating the distribution of sound detection performance was configured through the Unity game engine, and the environment directly configured using Unity ML-Agents and the reinforcement learning algorithm written in Python from the outside communicated with each other and learned. In particular, reinforcement learning is introduced to prevent the accumulation of wrong actions and affect learning, and to secure the maximum detection area for the sonobuoy while the vehicle flies to the target point in the shortest time. The optimal placement of the sonobuoy was achieved by applying the Deep Deterministic Policy Gradient (DDPG) algorithm. As a result of the learning, the agent flew through the sea area and passed only the points to achieve the optimal placement among the 70 target candidates. This means that an autonomous aerial vehicle that deploys a sonobuoy in the shortest time and maximum detection area, which is the requirement for optimal placement, has been implemented.

주제어

표/그림 (16)

그림 그림 1. 강화학습 모델 Fig. 1. Reinforcement-Learning Model
그림 그림 2. Unity ML-Agents를 이용한 통신 Fig. 2. Communication using Unity ML-Agents
표 표 1. 드론 비행 상태와 행동 벡터 Table 1. State of Drone Flight and Action Vector
그림 그림 3. 탐지 성능 분포도의 예 Fig. 3. Example of a detection performance distribution plot
그림 그림 4. Unity 엔진으로 만든 음향탐지 성능 분포도 Fig. 4. Acoustic detection performance distribution chart made with Unity engine
그림 그림 5. 탐지 영역 확보 Fig. 5. Securing the detection area
그림 그림 6. 탐지 영역이 겹치는 경우 Fig. 6. When detection areas overlap
그림 그림 7. 최대 탐지 영역, 최단 거리 목표지점 도출 Fig. 7. Deduction of maximum detection area and shortest distance target point
그림 그림 8. DDPG 알고리즘 구조 Fig. 8. Structure of DDPG Algorithm
그림 그림 9. 액터 신경망의 구조 Fig. 9. Structure of Actor Network
그림 그림 10. 크리틱 신경망 구조 Fig. 10. Structure of Critic Network
표 표 2. DDPG 알고리즘의 하이퍼 파라미터 Table 2. Hyper Parameter of DDPG Algorithm
그림 그림 11. 최초로 타겟에 도달 Fig. 11. Reach the first target
그림 그림 12. 여러 개의 타겟에 도달 Fig. 12. Reach multiple targets
그림 그림 13. 5개의 타겟에 도달하고 에피소드 종료 Fig. 13. Reach 5 targets and end the episode
그림 그림 14. 보상 값 변화 Fig. 14. Reward value change

AI 본문요약
AI-Helper

제안 방법

본 논문에서는 Unity 게임엔진을 통해 음향 탐지 성능 분포도를 모사한 환경을 구성하고 Unity ML-Agents를 통해 외부에서 Pyhton(Tensorflow)으로 작성된 DDPG 알고리즘과 통신하며 강화학습을 진행하였다. 학습의 대상인 에이전트 비행체는 드론으로 설정하였으며 보상 값을 높이는 학습 결과를 도출하기 위해 타겟으로 접근하는 적절한 보상 설계를 하였다.
본 논문에서는 액터 신경망과 크리틱 신경망, 학습 방법과 저장을 모두 Python의 Tensorflow를 통해 구현하였으며 이 Python 코드는 Unity를 통해 구성한 환경 및 에이전트와 통신하며 학습이 이루어진다.
따라서 음향 탐지 성능 분포도를 통해 어느 지점에 소노부이를 배치해야 최대 탐지 영역을 확보할 수 있는지, 어느 경로로 이동하면 최단 시간에 소노부이를 배치할 수 있는지를 유추할 수 있다. 본 연구에서는 모의적으로 음향 탐지 성능 분포도를 생성하고 에이전트가 이를 인식할 수 있도록 설계하였다.

이론/모형

DDPG의 특징은 3가지가 있다. 먼저 학습할 때 사용하는 궤적 데이터가 시간적으로 상관되어 그래디언트가 편향되는 것을 방지하기 위해 경험 리플레이(experience replay)방식을 사용한다. 이는 에이전트의 경험을 학습에 바로 사용하지 않고 그림 8처럼 리플레이 버퍼에 저장해 두었다가 버퍼에서 샘플을 무작위로 N개 추출하는 방식이다.

참고문헌 (15)

From Wikipedia, the free encyclopedia, Sonobuoy, https://en.wikipedia.org/wiki/Sonobuoy
From Wikipedia, the free encyclopedia, Reinforcement learning, https://en.wikipedia.org/wiki/Reinforcement_learning
V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness, M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis, "Human-level control through deep reinforcement learning", NATURE, Vol. 518, No.2 pp. 529-533, 2015.

상세보기
J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, "Proximal Policy Optimization Algorithms", OpenAI, 2017.
T. Lillicrap, J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver and D. Wierstra, "Continuous Control with Deep Reinforcement Learning", Google Deepmind, 2015.
Vincent Pierre (2017), Unity ML-Agents http://github.com/Unity-Technologies/ml-agents
S. Kim, W. Kim, J. Choi, Y. Yoon and J. Park, "Optimal Deployment of Sensor Nodes based on Peformance Surface of Acoustic Detection", Journal of the KIMST, Vol. 18, No. 5, pp. 538-547, 2015.
M. Cheon, S. Kim, J. Choi, C. Choi, S. Son and J. Park, "Optimal Search Pattern of Ships based on Performance Surface", Journal of the KIMST, Vol. 20, No. 3, pp. 328-336, 2017.
H.W Kim and W.C Lee, "Real-Time Path Planning for Mobile Robots Using Q-Learning", Journal of IKEEE, Vol.24, No.4, pp.71-77, 2020.
J. Kim and S.R Shim, "A Case Study on the Evolutionary Development of U.S Unmanned Aerial Vehicles(UAVs)", Journal of Advances in Military Studies, Vol. 3, No. 2, pp, 17-46, 2020.
Y. Cho, J. Lee and K. Lee, "CNN based Reinforcement Learning for Driving Behavior of Simulated Self-Driving Car", The transactions of The Korean Institute of Electrical Engineers, Vol. 69, No.11, pp.1740-1749, 2020.

상세보기
S. Park and D. Kim, "Autonomous Flying of Drone Based on PPO Reinforcement Learning Algorithm", Journal of Institute of Control, Robotics and Systems, Vol. 26, No.11, pp. 955-963, 2020.

상세보기
J. Lee, K. Kim, Y. Kim and J. Lee, "Singularity Avoidance Path Planning on Cooperative Task of Dual Manipulator Using DDPG Algorithm", The Journal of Korea Robotics Society, Vol.16, No.2, pp.137-146, 2021.

원문보기 상세보기
S. Park, Reinforcement-Learning with Mathematic, https://github.com/pasus/Reinforcement-Learning-Book
G. Min, M. Shin, S. Yoon, H. Lee, G. Jeong and D. Cho, Reinforcement-Learning with Tensorflow & Unity ML-Agents, https://github.com/reinforcement-learning-kr/Unity_ML_Agents

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

가상환경과 DDPG 알고리즘을 이용한 자율 비행체의 소노부이 최적 배치 연구
Research on Optimal Deployment of Sonobuoy for Autonomous Aerial Vehicles Using Virtual Environment and DDPG Algorithm 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (16)

표/그림 (16)

AI 본문요약
AI-Helper

제안 방법

이론/모형

후속연구

참고문헌 (15)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

가상환경과 DDPG 알고리즘을 이용한 자율 비행체의 소노부이 최적 배치 연구 Research on Optimal Deployment of Sonobuoy for Autonomous Aerial Vehicles Using Virtual Environment and DDPG Algorithm 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

표/그림 (16) 모든 표/그림 보기

표/그림 (16) 슬라이드로 보기

AI 본문요약 엑셀 다운로드 AI-Helper

제안 방법

이론/모형

후속연구

참고문헌 (15)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

가상환경과 DDPG 알고리즘을 이용한 자율 비행체의 소노부이 최적 배치 연구
Research on Optimal Deployment of Sonobuoy for Autonomous Aerial Vehicles Using Virtual Environment and DDPG Algorithm 원문보기

초록
AI-Helper

표/그림 (16)

표/그림 (16)

AI 본문요약
AI-Helper