[논문]통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가

이민학; 강우철

doi:10.5626/ktcp.2017.23.7.417

통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가
Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory

정보과학회. 컴퓨팅의 실제 논문지 = KIISE transactions on computing practices, v.23 no.7, 2017년, pp.417 - 423

이민학 (인천대학교 임베디드시스템공학과) , 강우철 (인천대학교 임베디드시스템공학과)

초록
AI-Helper

최근, 딥러닝을 사용 가능한 임베디드 디바이스가 상용화됨에 따라 임베디드 시스템 영역에서도 딥러닝 활용에 대한 다양한 연구가 진행되고 있다. 그러나 임베디드 시스템을 고성능 PC 환경과 비교하면 상대적으로 저사양의 CPU/GPU 프로세서와 메모리를 탑재하고 있으므로 딥러닝 기술의 적용에 있어서 많은 제약이 있다. 본 논문에서는 다양한 최신 딥러닝 네트워크들을 임베디드 디바이스에 적용했을때의 성능을 시간과 전력이라는 관점에서 실험적으로 평가한다. 또한, 호스트 CPU와 GPU 디바이스간의 메모리를 공유하는 임베디드 시스템들의 아키텍처적인 특성을 이용하여 메모리 복사를 줄임으로써 실시간 성능과 저전력성을 높이는 방법을 제시한다. 제안된 방법은 대표적인 공개 딥러닝 프레임워크인 Caffe를 수정하여 구현되었으며, 임베디드 GPU를 탑재한 NVIDIA Jetson TK1에서 성능평가 되었다. 실험결과, 대부분의 딥러닝 네트워크에서 뚜렷한 성능향상을 관찰할 수 있었다. 특히, 메모리 사용량이 높은 AlexNet에서 약 33%의 이미지 인식 속도 단축과 50%의 소비 전력량 감소를 관찰할 수 있었다.

Abstract ▼ AI-Helper

Recently, many embedded devices that have the computing capability required for deep learning have become available; hence, many new applications using these devices are emerging. However, these embedded devices have an architecture different from that of PCs and high-performance servers. In this paper, we propose a method that improves the performance of deep-learning framework by considering the architecture of an embedded device that shares memory between the CPU and the GPU. The proposed method is implemented in Caffe, an open-source deep-learning framework, and is evaluated on an NVIDIA Jetson TK1 embedded device. In the experiment, we investigate the image recognition performance of several state-of-the-art deep-learning networks, including AlexNet, VGGNet, and GoogLeNet. Our results show that the proposed method can achieve significant performance gain. For instance, in AlexNet, we could reduce image recognition latency by about 33% and energy consumption by about 50%.

주제어

참고문헌 (13)

Jia, Yangqing, et al., "Caffe: Convolutional architecture for fast feature embedding," Proc. of the 22nd ACM international conference on Multimedia, ACM, 2014.
Denton, Emily L., et al., "Exploiting linear structure within convolutional networks for efficient evaluation," Advances in Neural Information Processing Systems, 2014.
Max Jaderberg, et al., "Speeding up Convolutional Neural Networks with Low Rank Expansions," Proc. of the British Machine Vision Conference, BMVA Press, Sep. 2014.
T. He, Y. Fan, Y. Qian, T. Tan and K. Yu, "Reshaping deep neural network for fast decoding by node-pruning," 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 245-249, 2014.
Qiu, Jiantao, et al., "Going deeper with embedded fpga platform for convolutional neural network," Proc. of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016.
Lane, Nicholas D., et al., "DeepX: A software accelerator for low-power deep learning inference on mobile devices," 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), 2016.
Y. Gong, Liu Liu, Y. Ming and Lubomir Bourdev, "Compressing deep convolutional networks using vector quantization," arXiv preprint arXiv:1412.6115, 2014.
Krizhevsky, Alex, et al., "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, pp. 1097- 1105, 2012.
Simonyan, Karen, and Andrew Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
Szegedy, Christian, et al., "Going deeper with convolutions," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-9, 2015.
MinHak Lee and Woochul Kang, "Performance Enhancement and Evaluation of AES Cryptography using OpenCL on Embedded GPGPU," KIISE Transactions on Computing Practices, Vol. 22, No. 7, pp. 303-309, Jul. 2016. (in Korean)
Abe, Yuki, et al., "Power and performance analysis of GPU-accelerated systems," HotPower, 2012.
Krizhevsky, Alex, "One weird trick for parallelizing convolutional neural networks," arXiv preprint arXiv: 1404.5997, 2014.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가
Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (13)

이 논문을 인용한 문헌

저자의 다른 논문 :

관련 콘텐츠

원문 URL 링크

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가 Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (13)

이 논문을 인용한 문헌

저자의 다른 논문 :

강우철 (1)

관련 콘텐츠

원문 URL 링크

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가
Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory

초록
AI-Helper