[논문]심층 강화 학습을 이용한 Luxo 캐릭터의 제어

이정민; 이윤상

doi:10.15701/kcgs.2020.26.4.1

심층 강화 학습을 이용한 Luxo 캐릭터의 제어
Luxo character control using deep reinforcement learning 원문보기

컴퓨터그래픽스학회논문지 = Journal of the Korea Computer Graphics Society, v.26 no.4, 2020년, pp.1 - 8

이정민 (한양대학교 컴퓨터소프트웨어학과) , 이윤상 (한양대학교 컴퓨터소프트웨어학과)

초록
AI-Helper

캐릭터로 하여금 시뮬레이션 내에서 사용자가 원하는 동작을 보이도록 물리 기반 제어기를 만들 수 있다면 주변 환경의 변화와 다른 캐릭터와의 상호작용에 대하여 자연스러운 반응을 보이는 캐릭터 애니메이션을 생성할 수 있다. 최근 심층 강화 학습을 이용해 물리 기반 제어기가 더 안정적이고 다양한 동작을 합성하도록 하는 연구가 다수 이루어져 왔다. 본 논문에서는 다리가 하나 달린 픽사 애니메이션 스튜디오의 마스코트 캐릭터 Luxo를 주어진 목적지까지 뛰어 도착하게 하는 심층 강화학습 모델을 제시한다. 효율적으로 뛰는 동작을 학습하도록 하기 위해서 Luxo의 각 관절의 각도값들을 선형 보간법으로 생성하여 참조 모션을 만들었으며, 캐릭터는 이를 모방하면서 균형을 유지하여 목표한 위치까지 도달하도록 하는 제어 정책(control policy)을 학습한다. 참조 동작을 사용하지 않고 Luxo 동작을 제어하도록 학습된 정책과 비교한 실험 결과, 제안된 방법을 사용하면 사용자가 지정한 위치로 Luxo가 점프하며 이동하는 정책을 더 효율적으로 학습할 수 있었다.

Abstract ▼ AI-Helper

Motion synthesis using physics-based controllers can generate a character animation that interacts naturally with the given environment and other characters. Recently, various methods using deep neural networks have improved the quality of motions generated by physics-based controllers. In this paper, we present a control policy learned by deep reinforcement learning (DRL) that enables Luxo, the mascot character of Pixar animation studio, to run towards a random goal location while imitating a reference motion and maintaining its balance. Instead of directly training our DRL network to make Luxo reach a goal location, we use a reference motion that is generated to keep Luxo animation's jumping style. The reference motion is generated by linearly interpolating predetermined poses, which are defined with Luxo character's each joint angle. By applying our method, we could confirm a better Luxo policy compared to the one without any reference motions.

주제어

참고문헌 (33)

K. Yin, K. Loken, and M. van de Panne, "Simbicon: Simple biped locomotion control," ACM Trans. Graph., vol. 26, no. 3, p. Article 105, 2007.

상세보기
Y. Lee, S. Kim, and J. Lee, "Data-driven biped control," in ACM SIGGRAPH 2010 Papers, ser. SIGGRAPH'10. New York, NY, USA: Association for Computing Machinery, 2010. [Online]. Available: https://doi.org/10.1145/1833349.1781155
X. B. Peng, P. Abbeel, S. Levine, and M. van de Panne, "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills," ACM Trans. Graph., vol. 37, no. 4, pp. 143:1-143:14, July 2018. [Online]. Available: http://doi.acm.org/10.1145/3197517.3201311
J. Z. Kolter, P. Abbeel, and A. Y. Ng, "Hierarchical apprenticeship learning, with application to quadruped locomotion," in Proceedings of the 20th International Conference on Neural Information Processing Systems, ser. NIPS'07. Red Hook, NY, USA: Curran Associates Inc., 2007, p. 769-776.
P. Abbeel, A. Coates, and A. Ng, "Autonomous helicopter aerobatics through apprenticeship learning," I. J. Robotic Res., vol. 29, pp. 1608-1639, 11 2010.

상세보기
N. M. O. Heess, T. Dhruva, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. A. Riedmiller, and D. Silver, "Emergence of locomotion behaviours in rich environments," ArXiv, vol. abs/1707.02286, 2017.
S. Park, H. Ryu, S. Lee, S. Lee, and J. Lee, "Learning predictand-simulate policies from unorganized human motion data," ACM Trans. Graph., vol. 38, no. 6, 2019.

상세보기
M. de Lasa, I. Mordatch, and A. Hertzmann, "Featurebased locomotion controllers," ACM Trans. Graph., vol. 29, no. 4, July 2010. [Online]. Available: https://doi.org/10.1145/1778765.1781157

상세보기
S. Agrawal and M. van de Panne, "Task-based locomotion," ACM Transactions on Graphics (Proc. SIGGRAPH 2016), vol. 35, no. 4, 2016.
J. M. Wang, D. J. Fleet, and A. Hertzmann, "Optimizing walking controllers for uncertain inputs and environments," in ACM SIGGRAPH 2010 Papers, ser. SIGGRAPH '10. New York, NY, USA: Association for Computing Machinery, 2010. [Online]. Available: https://doi.org/10.1145/1833349.1778810
K. Wampler, Z. Popoviundefined, and J. Popoviundefined, "Generalizing locomotion style to new animals with inverse optimal regression," ACM Trans. Graph., vol. 33, no. 4, July 2014. [Online]. Available: https://doi.org/10.1145/2601097.2601192

상세보기
R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, no. 3-4, p. 229-256, May 1992. [Online]. Available: https://doi.org/10.1007/BF00992696
J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, "Trust region policy optimization," in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML'15. JMLR.org, 2015, p. 1889-1897.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, "Proximal policy optimization algorithms," 2017.
X. B. Peng and M. van de Panne, "Learning locomotion skills using deeprl: Does the choice of action space matter?" in Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, ser. SCA '17. New York, NY, USA: Association for Computing Machinery, 2017. [Online]. Available: https://doi.org/10.1145/3099564.3099567
W. Yu, G. Turk, and C. K. Liu, "Learning symmetry and low-energy locomotion," CoRR, vol. abs/1801.08093, 2018. [Online]. Available: http://arxiv.org/abs/1801.08093
X. B. Peng, G. Berseth, K. Yin, and M. Van De Panne, "Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning," ACM Trans. Graph., vol. 36, no. 4, July 2017. [Online]. Available: https://doi.org/10.1145/3072959.3073602

상세보기
J. Won, J. Park, and J. Lee, "Aerobatics control of flying creatures via self-regulated learning," ACM Trans. Graph., vol. 37, no. 6, Dec. 2018. [Online]. Available: https://doi.org/10.1145/3272127.3275023
A. Witkin and M. Kass, "Spacetime constraints," SIGGRAPH Comput. Graph., vol. 22, no. 4, p. 159-168, June 1988. [Online]. Available: https://doi.org/10.1145/378456.378507

상세보기
K. Yamane, Y. Ariki, and J. Hodgins, "Animating nonhumanoid characters with human motion data," in Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ser. SCA '10. Goslar, DEU: Eurographics Association, 2010, p. 169-178.
A. Sharma and K. M. Kitani, "Phase-parametric policies for reinforcement learning in cyclic environments," in AAAI, 2018.
T. Kwon, Y. Lee, and M. van de Panne, "Fast and flexible multilegged locomotion using learned centroidal dynamics," ACM Trans. Graph., 2020. [Online]. Available: http://calab.hanyang.ac.kr/papers/flexLoco.html
R. Sutton and A. Barto, Reinforcement Learning: An Introduction, ser. Adaptive Computation and Machine Learning series. MIT Press, 1998. [Online]. Available: https://books.google.co.kr/books?id6DKPtQEACAAJ
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, "High-dimensional continuous control using generalized advantage estimation," 2015.
"stable baselines," https://github.com/hill-a/stable-baselines, accessed: 2020-03-10.
S. Coros, P. Beaudoin, and M. van de Panne, "Generalized biped walking control," ACM Transctions on Graphics, vol. 29, no. 4, p. Article 130, 2010.
I. Mordatch, E. Todorov, and Z. Popoviundefined, "Discovery of complex behaviors through contact-invariant optimization," ACM Trans. Graph., vol. 31, no. 4, July 2012. [Online]. Available: https://doi.org/10.1145/2185520.2185539

상세보기
J. Tan, K. Liu, and G. Turk, "Stable proportionalderivative controllers," IEEE Comput. Graph. Appl., vol. 31, no. 4, p. 34-44, July 2011. [Online]. Available: https://doi.org/10.1109/MCG.2011.30

상세보기
A. Rajeswaran, V. Kumar, A. Gupta, J. Schulman, E. Todorov, and S. Levine, "Learning complex dexterous manipulation with deep reinforcement learning and demonstrations," CoRR, vol. abs/1709.10087, 2017. [Online]. Available: http://arxiv.org/abs/1709.10087
Y. Lee, M. S. Park, T. Kwon, and J. Lee, "Locomotion control for many-muscle humanoids," ACM Trans. Graph., vol. 33, no. 6, Nov. 2014. [Online]. Available: https://doi.org/10.1145/2661229.2661233

상세보기
D. Sharon and M. van de Panne, "Synthesis of controllers for stylized planar bipedal walking," in Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005, pp. 2387-2392.
K. Bergamin, S. Clavet, D. Holden, and J. R. Forbes, "Drecon: Data-driven responsive control of physics-based characters," ACM Trans. Graph., vol. 38, no. 6, Nov. 2019. [Online]. Available: https://doi.org/10.1145/3355089.3356536

상세보기
K. Lee, S. Lee, and J. Lee, "Interactive character animation by learning multi-objective control," ACM Trans. Graph., vol. 37, no. 6, Dec. 2018. [Online]. Available: https://doi.org/10.1145/3272127.3275071

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

심층 강화 학습을 이용한 Luxo 캐릭터의 제어
Luxo character control using deep reinforcement learning 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (33)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

심층 강화 학습을 이용한 Luxo 캐릭터의 제어 Luxo character control using deep reinforcement learning 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (33)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

심층 강화 학습을 이용한 Luxo 캐릭터의 제어
Luxo character control using deep reinforcement learning 원문보기

초록
AI-Helper