$\require{mediawiki-texvc}$

연합인증

연합인증 가입 기관의 연구자들은 소속기관의 인증정보(ID와 암호)를 이용해 다른 대학, 연구기관, 서비스 공급자의 다양한 온라인 자원과 연구 데이터를 이용할 수 있습니다.

이는 여행자가 자국에서 발행 받은 여권으로 세계 각국을 자유롭게 여행할 수 있는 것과 같습니다.

연합인증으로 이용이 가능한 서비스는 NTIS, DataON, Edison, Kafe, Webinar 등이 있습니다.

한번의 인증절차만으로 연합인증 가입 서비스에 추가 로그인 없이 이용이 가능합니다.

다만, 연합인증을 위해서는 최초 1회만 인증 절차가 필요합니다. (회원이 아닐 경우 회원 가입이 필요합니다.)

연합인증 절차는 다음과 같습니다.

최초이용시에는
ScienceON에 로그인 → 연합인증 서비스 접속 → 로그인 (본인 확인 또는 회원가입) → 서비스 이용

그 이후에는
ScienceON 로그인 → 연합인증 서비스 접속 → 서비스 이용

연합인증을 활용하시면 KISTI가 제공하는 다양한 서비스를 편리하게 이용하실 수 있습니다.

[해외논문] Empirical Performance Analysis of Collective Communication for Distributed Deep Learning in a Many-Core CPU Environment 원문보기

Applied sciences, v.10 no.19, 2020년, pp.6717 -   

Woo, Junghoon (School of Electronics and Information Engineering, Korea Aerospace University, 76, Hanggongdaehak-ro, Deogyang-gu, Gyeonggi-do, Goyang-si 10540, Korea) ,  Choi, Hyeonseong (School of Electronics and Information Engineering, Korea Aerospace University, 76, Hanggongdaehak-ro, Deogyang-gu, Gyeonggi-do, Goyang-si 10540, Korea) ,  Lee, Jaehwan (School of Electronics and Information Engineering, Korea Aerospace University, 76, Hanggongdaehak-ro, Deogyang-gu, Gyeonggi-do, Goyang-si 10540, Korea)

Abstract AI-Helper 아이콘AI-Helper

To accommodate lots of training data and complex training models, “distributed” deep learning training has become employed more and more frequently. However, communication bottlenecks between distributed systems lead to poor performance of distributed deep learning training. In this stud...

참고문헌 (45)

  1. Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press. 

  2. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., and Isard, M. (2016, January 2-4). Tensorflow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA. 

  3. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 4-9). Automatic differentiation in pytorch. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. 

  4. 10.1109/MICRO.2018.00023 Li, Y., Park, J., Alian, M., Yuan, Y., Qu, Z., Pan, P., Wang, R., Schwing, A., Esmaeilzadeh, H., and Kim, N.S. (2018, January 20-24). A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2018), Fukuoka, Japan. 

  5. Oliphant Python for scientific computing Comput. Sci. Eng. 2007 10.1109/MCSE.2007.58 9 10 

  6. Beazley, D. (2010, January 25-26). Understanding the python gil. Proceedings of the 2010 PyCON Python Conference, Atlanta, GA, USA. 

  7. (2020, May 20). Global Interpreter Lock. Available online: https://wiki.python.org/moin/GlobalInterpreterLock/. 

  8. (2020, May 20). MPI Forum. Available online: https://www.mpi-forum.org/. 

  9. Gropp A high-performance, portable implementation of the MPI message passing interface standard Parallel Comput. 1996 10.1016/0167-8191(96)00024-5 22 789 

  10. (2020, May 20). MPI Tutorial. Available online: https://mpitutorial.com/. 

  11. 10.1109/HOTCHIPS.2015.7477467 Sodani, A. (2015, January 22-25). Knights landing (knl): 2nd generation intel® xeon phi processor. Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), Cupertino, CA, USA. 

  12. (2020, May 20). Memkind Library. Available online: https://github.com/memkind/memkind/. 

  13. (2020, May 20). Cython. Available online: https://cython.org/. 

  14. Behnel Cython: The best of both worlds Comput. Sci. Eng. 2011 10.1109/MCSE.2010.118 13 31 

  15. (2020, May 20). Docker. Available online: https://www.docker.com/. 

  16. Ahn Soft memory box: A virtual shared memory framework for fast deep neural network training in distributed high performance computing IEEE Access 2018 10.1109/ACCESS.2018.2834146 6 26493 

  17. Peng Characterizing the performance benefit of hybrid memory system for HPC applications Parallel Comput. 2018 10.1016/j.parco.2018.04.007 76 57 

  18. Cho Exploring the Performance Impact of Emerging Many-Core Architectures on MPI Communication J. Comput. Sci. Eng. 2018 10.5626/JCSE.2018.12.4.170 12 170 

  19. 10.1109/AINA.2017.79 Li, Z., Kihl, M., Lu, Q., and Andersson, J.A. (2017, January 27-29). Performance overhead comparison between hypervisor and container based virtualization. Proceedings of the 31st IEEE International Conference on Advanced Information Networking and Applications (AINA 2017), Taipei, Taiwan. 

  20. 10.1145/3147213.3147231 Zhang, J., Lu, X., and Panda, D.K. (2017, January 5-8). Is singularity-based container technology ready for running MPI applications on HPC clouds?. Proceedings of the 10th International Conference on Utility and Cloud Computing (UCC 2017), Austin, TX, USA. 

  21. 10.1109/ICPP.2016.38 Zhang, J., Lu, X., and Panda, D.K. (2016, January 16-19). High performance MPI library for container-based HPC cloud on InfiniBand clusters. Proceedings of the 45th International Conference on Parallel Processing (ICPP 2016), Philadelphia, PA, USA. 

  22. 10.1109/CVPR.2009.5206848 Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20-25). Imagenet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 

  23. Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., and Yang, K. (2012, January 3-8). Large scale distributed deep networks. Proceedings of the 26th Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA. 

  24. Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q.V., and Wu, Y. (2019, January 8-14). Gpipe: Efficient training of giant neural networks using pipeline parallelism. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada. 

  25. 10.1109/FAS-W.2018.00023 Kim, Y., Lee, J., Kim, J.S., Jei, H., and Roh, H. (2018, January 3-7). Efficient multi-GPU memory management for deep learning acceleration. Proceedings of the 2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS* W), Trento, Italy. 

  26. Kim Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration Clust. Comput. 2019 10.1007/s10586-019-02974-6 23 2193 

  27. 10.1109/FAS-W.2019.00050 Kim, Y., Choi, H., Lee, J., Kim, J.S., Jei, H., and Roh, H. (2019, January 16-20). Efficient Large-Scale Deep Learning Framework for Heterogeneous Multi-GPU Cluster. Proceedings of the 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS* W), Umeå, Sweden. 

  28. Kim Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster Clust. Comput. 2020 10.1007/s10586-020-03144-9 23 2287 

  29. 10.1109/ICASSP.2014.6854672 Heigold, G., McDermott, E., Vanhoucke, V., Senior, A., and Bacchiani, M. (2014, January 4-9). Asynchronous stochastic optimization for sequence training of deep neural networks. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy. 

  30. Sergeev, A., and Del Balso, M. (2018). Horovod: Fast and easy distributed deep learning in TensorFlow. arXiv. 

  31. (2020, May 20). CPython. Available online: https://docs.python.org/3/. 

  32. Zhang, C., Yuan, X., and Srinivasan, A. (2010, January 19-23). Processor affinity and MPI performance on SMP-CMP clusters. Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), Atlanta, GA, USA. 

  33. 10.1109/ICPADS.2015.69 Neuwirth, S., Frey, D., and Bruening, U. (2015, January 14-17). Communication models for distributed intel xeon phi coprocessors. Proceedings of the 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), Melbourne, Australia. 

  34. 10.1109/ACSOS-C51401.2020.00020 Lee, C., Lee, J., Koo, D., Kim, C., Bang, J., Byun, E., and Eom, H. (2020, January 17-21). Empirical Analysis of the I/O Characteristics of a Highly Integrated Many-Core Processor. Proceedings of the 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington, DC, USA. 

  35. Sodani Knights landing: Second-generation intel xeon phi product IEEE Micro 2016 10.1109/MM.2016.25 36 34 

  36. Agelastos, A.M., Rajan, M., Wichmann, N., Baker, R., Domino, S.P., Draeger, E.W., Anderson, S., Balma, J., Behling, S., and Berry, M. (2017). Performance on Trinity Phase 2 (a Cray XC40 Utilizing Intel Xeon Phi Processors) with Acceptance Applications and Benchmarks, Sandia National Lab. (SNL-NM). 

  37. Vladimirov, A., and Asai, R. (2016). Clustering Modes in Knights Landing Processors: Developer’s Guide, Colfax International. 

  38. 10.1016/B978-0-12-809194-4.00002-8 Jeffers, J., Reinders, J., and Sodani, A. (2016). Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition, Morgan Kaufmann. 

  39. (2020, May 20). Numpy. Available online: https://numpy.org/. 

  40. Oliphant, T.E. (2020, September 23). A Guide to NumPy. Available online: https://web.mit.edu/dvp/Public/numpybook.pdf. 

  41. Walt The NumPy array: A structure for efficient numerical computation Comput. Sci. Eng. 2011 10.1109/MCSE.2011.37 13 22 

  42. (2020, May 20). SciPy. Available online: https://scipy.org/. 

  43. (2020, May 20). MPI for Python. Available online: https://mpi4py.readthedocs.io/en/stable/. 

  44. (2020, May 20). Pickle-Python Object Serialization. Available online: https://docs.python.org/3/library/pickle.html/. 

  45. (2020, May 20). Buffer Protocol. Available online: https://docs.python.org/3/c-api/buffer.html/. 

LOADING...

활용도 분석정보

상세보기
다운로드
내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

관련 콘텐츠

오픈액세스(OA) 유형

GOLD

오픈액세스 학술지에 출판된 논문

유발과제정보 저작권 관리 안내
섹션별 컨텐츠 바로가기

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

AI-Helper 아이콘
AI-Helper
안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

선택된 텍스트

맨위로