[논문]GPGPU 기반 딥 러닝 알고리즘의 병렬화 구현 및 신경망 가속기 설계

최세진

GPGPU 기반 딥 러닝 알고리즘의 병렬화 구현 및 신경망 가속기 설계 원문보기

최세진 (서경대학교 일반대학원 전자컴퓨터공학과 국내석사)

초록 ▼
AI-Helper

최근 딥 러닝(Deep learning)에 대한 관심이 증가하면서 관련 연구가 활발히 진행되고 있다. 구글 학술 문헌 검색에 따르면, “딥 러닝(Deep learning)” 키워드로 검색되는 학술 문헌 횟수는 매년 급증하고 있다. 또한, 산업계에서도 세계적인 IT 기업을 중심으로 딥 러닝에 대한 연구 개발이 활발히 진행되고 있다. 딥 러닝은 기존 알고리즘에 비하여 높은 정확도를 보여주었지만 방대한 연산량으로 인하여 2000년대 중반까지는 거의 활용되지 않았다. 이러한 문제점은 최근 하드웨어 시스템의 성능이 급격히 향상됨에 따라 해결할 수 있게 되었다.
최근에는 GPGPU(General Purpose Graphic Processing Unit), FPGA(Field Programmable Gate Array), ASIC(Application Specific Integrated Circuits) 등 다양한 하드웨어 시스템에서 딥 러닝 알고리즘을 가속화하는 방안에 대한 연구가 활발히 진행되고 있다. 이에 따라서, 본 논문에서는 GPGPU 기반 CNN(Convolutional Neural Network) 알고리즘의 병렬화 구현 및 FPGA 기반 ANN(Artificial Neural Network) 가속기 설계 방법을 제안한다.
본 논문에서 제안하는 GPGPU 기반 CNN 학습 병렬화는 NVIDIA에서 제공하는 CUDA 플랫폼을 사용하였다. CNN의 각 계층(Layer)에 스레드(Thread) 기법을 적용하여 병렬 처리로 학습 속도를 향상시켰다. GPGPU 기반 병렬화가 적용된 CNN은 CPU에서 동작하는 CNN 학습 프로그램과 비교하여 약 72% 학습 시간을 단축하여 성능이 향상된 것을 확인할 수 있었다. FPGA 기반 ANN 가속기는 Verilog HDL로 구현하였으며 FSM(Finite State Machine) 기반 제어부(Control unit)의 신호로 동작한다. 고정소수점 및 근사치를 이용한 활성화 함수 모듈 등을 통하여 자원 사용량을 감소시켰다. 설계된 가속기는 기존 ANN 가속기[25]와 비교하여 사용되는 자원량 대비 학습 수렴 속도가 빠르다는 것을 확인할 수 있었으며 동일 구조의 ANN을 GPGPU로 학습한 결과와 비교하여 약 41% 학습 속도가 향상되었다.

Abstract ▼ AI-Helper

Recently, as the visual deep learning attracts more and more attention, related research has been actively initiated. According to the Google Scholar Search results, the number of academic literatures searched by the keyword “deep learning” has increased dramatically every year. Furthermore, there have been active research and development activities on the deep learning leaded by the world-renowned IT companies in the industry. However, until the mid-2000s, the deep learning was barely used on commercial products, because of its vast amount of calculation. Due to this problem, it was hard to apply the deep learning technology to our everyday life, and accordingly, the relevant research was not actively carried out. However, recently, such problem was resolved as the hardware system performance has been dramatically improved.
Today, researches are being actively carried out on measures to accelerate the deep learning algorithm under various hardware systems including the GPGPU (General Purpose Graphic Processing Unit), the FPGA (Field Programmable Gate Array), and the ASIC (Application Specific Integrated Circuits). As such, this study aims to suggest methods of (i) the parallel implementation of the GPGPU-based CNN (Convolutional Neural Network) algorithm and (ii) the design of the FPGA-based ANN (Artificial Neural Network) accelerator.
The parallel implementation of the GPGPU-based CNN learning, which is suggested in this study, uses the CUDA platform provided by NVIDIA. The learning speed was improved by parallel processing that implements the thread method for each layer of the CNN. The CNN with the GPGPU-based parallel implementation shortens the learning time by approximately 72% in comparison with the CNN learning program runs on the CPU. The FPGA-based ANN accelerator, implemented through Verilog HDL, operates by the signal from the Finite State Machine (FSM)-based control unit. The amount of resource use was reduced with the activation function module using the fixed-point and the approximate value. It was confirmed that the designed accelerator showed the performance improvement through faster learning convergence speed against the amount of used resources compared to the existing ANN accelerator [25]. In addition, in comparison with the learning of the ANN, which has the same structure with the GPGPU, the learning speed was improved by approximately 41%.

학위논문 정보

저자	최세진
학위수여기관	서경대학교 일반대학원
학위구분	국내석사
학과	전자컴퓨터공학과
발행연도	2018
총페이지	59 p.
언어	kor
원문 URL	http://www.riss.kr/link?id=T14793399&outLink=K
정보원	한국교육학술정보원

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

GPGPU 기반 딥 러닝 알고리즘의 병렬화 구현 및 신경망 가속기 설계 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

GPGPU 기반 딥 러닝 알고리즘의 병렬화 구현 및 신경망 가속기 설계 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

초록 ▼
AI-Helper