최소 단어 이상 선택하여야 합니다.
최대 10 단어까지만 선택 가능합니다.
다음과 같은 기능을 한번의 로그인으로 사용 할 수 있습니다.
NTIS 바로가기한국음향학회지= The journal of the acoustical society of Korea, v.39 no.6, 2020년, pp.505 - 514
육동석 (고려대학교 컴퓨터학과 인공지능 연구실) , 이효원 (KT 융합기술원 AI 연구소) , 유인철 (고려대학교 컴퓨터학과 인공지능 연구실)
Since a large amount of training data is typically needed to train Deep Neural Networks (DNNs), a parallel training approach is required to train the DNNs. The Stochastic Gradient Descent (SGD) algorithm is one of the most widely used methods to train the DNNs. However, since the SGD is an inherentl...
* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.
G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Computation, 14, 1771-1800 (2002).
G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Computation, 18, 1527-1554 (2006).
A. Krizhevsky, I. Sutskever, and G. Hinton, "Image Net classification with deep convolutional neural networks," Proc. Advances in Neural Information Processing Systems, 1097-1105 (2012).
G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Signal Processing Magazine, 29, 82-97 (2012).
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets," Proc. Advances in Neural Information Processing Systems, 2672-2680 (2014).
D. Kingma and M. Welling, "Auto-encoding variational Bayes," arXiv:1312.6114 (2013).
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proc. IEEE Conference on Computer Vision and Pattern Recognition, 770-778 (2016).
D. Rumelhart, G. Hinton, and R. Williams, "Learning representations by back-propagating errors," Nature, 323, 533-536 (1986).
A. Paszke, S. Gross, F. Massa, A. Lere, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. Devito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, "Pytorch: An imperative style, high-performance deep learning library," Proc. Advances in Neural Information Processing Systems, 8026-8037 (2019).
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Ollah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Wardern, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems," arXiv:1603.04467 (2016).
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hanneman, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely, "The Kaldi speech recognition toolkit," Proc. IEEE Automatic Speech Recognition and Understanding Workshop (2011).
S. Smith and Q. Le, "A Bayesian perspective on generalization and stochastic gradient descent," Proc. Int. Conf. on Learning Representations (2018).
T. Zhang, "Solving large scale linear prediction problems using stochastic gradient descent algorithms," Proc. Int. Conf. on Machine learning (2004).
F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, "On parallelizability of stochastic gradient descent for speech DNNs," Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, 235-239 (2014).
M. Zinkevich, M. Weimer, A. Smola, and L. Li, "Parallelized stochastic gradient descent," Proc. Advances in Neural Information Processing Systems, 2595-2603 (2010).
L. Valiant, "A bridging model for parallel computation," Communications of the ACM, 33, 103-111 (1990).
H. Su, H. Chen, and H. Xu, "Experiments on parallel training of deep neural network using model averaging," arXiv:1507.01239 (2015).
J. Hermans, On scalable deep learning and parallelizing gradient descent, (Master Thesis, Maastricht University, 2017).
S. Zhang, A. Choromanska, and Y. LeCun, "Deep learning with elastic averaging SGD," Proc. Advances in Neural Information Processing Systems, 685-693 (2015).
K. Chen and Q. Huo, "Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering," Proc. IEEE Int. Conf. on Acoustic, Speech, and Signal Processing, 5880-5884 (2016).
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large scale distributed deep networks," Proc. Advances in Neural Information Processing Systems, 1223-1231 (2012).
J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz, "Revisiting distributed synchronous SGD," arXiv:1604.00981 (2016).
F. Niu, B. Recht, C. Re, and S. Wright, "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent," Proc. Advances in Neural Information Processing Systems, 693-701 (2011).
S. Sallinen, N. Satish, M. Smelyanskiy, S. Sury, and C. Re, "High performance parallel stochastic gradient descent in shared memory," Proc. IEEE International Parallel and Distributed Processing Symposium, 873-882 (2016).
S. Zhao and W. Li, "Fast asynchronous parallel stochastic gradient descent: A lock-free approach with convergence guarantee," Proc. AAAI Conference on Artificial Intelligence, 2379-2385 (2016)
X. Lian, W. Zhang, C. Zhang, and J. Liu, "Asynchronous decentralized parallel stochastic gradient descent," Proc. Int. Conf. on Machine Learning, 3049-3058 (2018).
I. Mitliagkas, C. Zhang, S. Hadjis, and C. Re, "Asynchrony begets momentum, with an application to deep learning," Proc. Annual Allerton Conference, 997-1004 (2016).
S. Zheng, Q. Meng, T. Wang, W. Chen, N. Yu, Z. Ma, and T. Liu, "Asynchronous stochastic gradient descent with delay compensation," Proc. Int. Conf. on Machine Learning, 4120-4129 (2017).
O. Yadan, K. Adams, Y. Taigman, and M. Ranzato, "Multi-GPU training of ConvNets," arXiv:1312.5853 (2013).
A. Petrowski, G. Dreyfus, and C. Girault, "Performance analysis of a pipelined backpropagation parallel algorithm," IEEE Trans. on Neural Networks, 4, 970-981 (1993).
X. Chen, A. Eversole, G. Li, D. Yu, and F. Seide, "Pipelined back-propagation for context-dependent deep neural networks," Proc. Interspeech, 26-29 (2012).
Z. Huo, B. Gu, Q. Yang, and H. Huang, "Decoupled parallel backpropagation with convergence guarantee," Proc. Int. Conf. on Machine Learning, 2098-2106 (2018).
Z. Huo, B. Gu, and H. Huang, "Training neural networks using features replay," Proc. Advances in Neural Information Processing Systems, 6659-6668 (2018).
M. Jaderberg, W. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver, and K. Kavukcuoglu, "Decoupled neural interfaces using synthetic gradients," Proc. Int. Conf. on Machine Learning, 1627-1635 (2017).
C. Chen, C. Yang, and H. Cheng, "Efficient and robust parallel DNN training through model parallelism on multi-GPU platform," arXiv:1809.02839 (2018).
Y. Huang, Y. Cheng, A. Bapna, O. Firat, M. Chen, D. Chen, H. Lee, J. Ngiam, Q. Le, Y. Wu, and Z. Chen, "GPipe: Efficient training of giant neural networks using pipeline parallelism," Proc. Advances in Neural Information Processing Systems, 103-112 (2019).
H. Lee, K. Lee, I. Yoo, and D. Yook, "Analysis of parallel training algorithms for deep neural networks," Proc. Annual Conference on Computational Science and Computational Intelligence, 1462-1463 (2018).
D. Narayanan, A. Harlap, A. Phanishayee, V. Seshadri, N. Devanur, G. Ganger, P. Gibbons, and M. Zaharia, "PipeDream: Generalized pipeline parallelism for DNN training," ACM Symposium on Operating Systems Principles, 1-15 (2019).
S. Watanabe, M. Mandel, J. Barker, E. Vincent, A. Arora, X. Chang, S. Khudanpur, V. Manohar, D. Povey, D. Raj, D. Snyder, A. Subramania, J. Trmal, B. Yair, C. Boeddeker, Z. Ni, Y. Fujita, S. Horiguchi, N. Kanda, and T. Yoshioka, "CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings," Proc. Int. Workshop on Speech Processing in Everyday Environments (2020).
*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.
오픈액세스 학술지에 출판된 논문
※ AI-Helper는 부적절한 답변을 할 수 있습니다.