최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기Applied sciences, v.10 no.19, 2020년, pp.6717 -
Woo, Junghoon (School of Electronics and Information Engineering, Korea Aerospace University, 76, Hanggongdaehak-ro, Deogyang-gu, Gyeonggi-do, Goyang-si 10540, Korea) , Choi, Hyeonseong (School of Electronics and Information Engineering, Korea Aerospace University, 76, Hanggongdaehak-ro, Deogyang-gu, Gyeonggi-do, Goyang-si 10540, Korea) , Lee, Jaehwan (School of Electronics and Information Engineering, Korea Aerospace University, 76, Hanggongdaehak-ro, Deogyang-gu, Gyeonggi-do, Goyang-si 10540, Korea)
To accommodate lots of training data and complex training models, “distributed” deep learning training has become employed more and more frequently. However, communication bottlenecks between distributed systems lead to poor performance of distributed deep learning training. In this stud...
Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., and Isard, M. (2016, January 2-4). Tensorflow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA.
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. (2017, January 4-9). Automatic differentiation in pytorch. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.
10.1109/MICRO.2018.00023 Li, Y., Park, J., Alian, M., Yuan, Y., Qu, Z., Pan, P., Wang, R., Schwing, A., Esmaeilzadeh, H., and Kim, N.S. (2018, January 20-24). A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2018), Fukuoka, Japan.
Oliphant Python for scientific computing Comput. Sci. Eng. 2007 10.1109/MCSE.2007.58 9 10
Beazley, D. (2010, January 25-26). Understanding the python gil. Proceedings of the 2010 PyCON Python Conference, Atlanta, GA, USA.
(2020, May 20). Global Interpreter Lock. Available online: https://wiki.python.org/moin/GlobalInterpreterLock/.
(2020, May 20). MPI Forum. Available online: https://www.mpi-forum.org/.
Gropp A high-performance, portable implementation of the MPI message passing interface standard Parallel Comput. 1996 10.1016/0167-8191(96)00024-5 22 789
(2020, May 20). MPI Tutorial. Available online: https://mpitutorial.com/.
(2020, May 20). Memkind Library. Available online: https://github.com/memkind/memkind/.
(2020, May 20). Cython. Available online: https://cython.org/.
Behnel Cython: The best of both worlds Comput. Sci. Eng. 2011 10.1109/MCSE.2010.118 13 31
(2020, May 20). Docker. Available online: https://www.docker.com/.
Ahn Soft memory box: A virtual shared memory framework for fast deep neural network training in distributed high performance computing IEEE Access 2018 10.1109/ACCESS.2018.2834146 6 26493
Peng Characterizing the performance benefit of hybrid memory system for HPC applications Parallel Comput. 2018 10.1016/j.parco.2018.04.007 76 57
Cho Exploring the Performance Impact of Emerging Many-Core Architectures on MPI Communication J. Comput. Sci. Eng. 2018 10.5626/JCSE.2018.12.4.170 12 170
10.1109/AINA.2017.79 Li, Z., Kihl, M., Lu, Q., and Andersson, J.A. (2017, January 27-29). Performance overhead comparison between hypervisor and container based virtualization. Proceedings of the 31st IEEE International Conference on Advanced Information Networking and Applications (AINA 2017), Taipei, Taiwan.
Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., and Yang, K. (2012, January 3-8). Large scale distributed deep networks. Proceedings of the 26th Conference on Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA.
Huang, Y., Cheng, Y., Bapna, A., Firat, O., Chen, D., Chen, M., Lee, H., Ngiam, J., Le, Q.V., and Wu, Y. (2019, January 8-14). Gpipe: Efficient training of giant neural networks using pipeline parallelism. Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada.
Kim Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration Clust. Comput. 2019 10.1007/s10586-019-02974-6 23 2193
10.1109/FAS-W.2019.00050 Kim, Y., Choi, H., Lee, J., Kim, J.S., Jei, H., and Roh, H. (2019, January 16-20). Efficient Large-Scale Deep Learning Framework for Heterogeneous Multi-GPU Cluster. Proceedings of the 2019 IEEE 4th International Workshops on Foundations and Applications of Self* Systems (FAS* W), Umeå, Sweden.
Kim Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster Clust. Comput. 2020 10.1007/s10586-020-03144-9 23 2287
10.1109/ICASSP.2014.6854672 Heigold, G., McDermott, E., Vanhoucke, V., Senior, A., and Bacchiani, M. (2014, January 4-9). Asynchronous stochastic optimization for sequence training of deep neural networks. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.
Sergeev, A., and Del Balso, M. (2018). Horovod: Fast and easy distributed deep learning in TensorFlow. arXiv.
(2020, May 20). CPython. Available online: https://docs.python.org/3/.
Zhang, C., Yuan, X., and Srinivasan, A. (2010, January 19-23). Processor affinity and MPI performance on SMP-CMP clusters. Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), Atlanta, GA, USA.
10.1109/ACSOS-C51401.2020.00020 Lee, C., Lee, J., Koo, D., Kim, C., Bang, J., Byun, E., and Eom, H. (2020, January 17-21). Empirical Analysis of the I/O Characteristics of a Highly Integrated Many-Core Processor. Proceedings of the 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington, DC, USA.
Sodani Knights landing: Second-generation intel xeon phi product IEEE Micro 2016 10.1109/MM.2016.25 36 34
Agelastos, A.M., Rajan, M., Wichmann, N., Baker, R., Domino, S.P., Draeger, E.W., Anderson, S., Balma, J., Behling, S., and Berry, M. (2017). Performance on Trinity Phase 2 (a Cray XC40 Utilizing Intel Xeon Phi Processors) with Acceptance Applications and Benchmarks, Sandia National Lab. (SNL-NM).
Vladimirov, A., and Asai, R. (2016). Clustering Modes in Knights Landing Processors: Developer’s Guide, Colfax International.
(2020, May 20). Numpy. Available online: https://numpy.org/.
Oliphant, T.E. (2020, September 23). A Guide to NumPy. Available online: https://web.mit.edu/dvp/Public/numpybook.pdf.
Walt The NumPy array: A structure for efficient numerical computation Comput. Sci. Eng. 2011 10.1109/MCSE.2011.37 13 22
(2020, May 20). SciPy. Available online: https://scipy.org/.
(2020, May 20). MPI for Python. Available online: https://mpi4py.readthedocs.io/en/stable/.
(2020, May 20). Pickle-Python Object Serialization. Available online: https://docs.python.org/3/library/pickle.html/.
(2020, May 20). Buffer Protocol. Available online: https://docs.python.org/3/c-api/buffer.html/.
해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
오픈액세스 학술지에 출판된 논문
※ AI-Helper는 부적절한 답변을 할 수 있습니다.