[논문]Energy-efficient neural network processing accelerator design for CNN-based super resolution in edge devices

이수민

Energy-efficient neural network processing accelerator design for CNN-based super resolution in edge devices 원문보기

이수민 (Graduate School, Yonsei University Department of Electrical and Electronic Engineering 국내박사)

초록 ▼
AI-Helper

컴퓨터 비전의 영상 처리 분야에서 컨볼루션 신경망(CNN)이 괄목할 만한 성능을 보여준 이후로 CNN은 분류, 객체 검출, 초해상도 복원(SR)과 같은 많은 첨단 응용 분야에 활용되고 있다. 특히, CNN 기반 SR은 기존 업스케일링 기술에 비해 매우 우수한 영상 복원 성능을 보여 차세대 업스케일링 기술로 주목받고 있다. 하지만, 뛰어난 잠재력에도 불구하고 엣지 장치에서 이 기술을 가속하는 것은 네트워크 고유의 특성과 프로세싱 요소 활용도 저하로 인해 많은 어려움이 있다. 따라서 본 논문에서는 하드웨어 효율적인 CNN 기반 SR 가속기를 설계하기 위한 두 가지 최적화 방안을 제안한다.
먼저, 본 논문에서는 지연 균형 최적화(DBO)를 제안한다. DBO는 계층 연산량과 메모리 대역폭을 모두 고려한 최적의 프로세스 토폴로지를 반복적으로 찾아 각 융합 계층의 타이밍 스큐 균형을 맞추는 데 중점을 둔다. 또한, 순환 이동 계층 파이프라인(CSLP) 구조를 제안한다. CSLP는 캐시 메모리를 위해 미세 조정된 엑세스 시퀀스를 사용함으로써 스토리지를 절약하고 캐시 제어의 복잡도를 완화한다. 최적화 결과를 검증하기 위해 CNN 기반 SR 가속기는 FPGA 보드에 구현하여 성능을 검증하였다. 제안하는 가속기는 이전 최신 가속기보다 73% 감소된 작은 캐시 스토리지를 사용하면서 동시에 프로세스 요소의 활용도를 5% 개선한다. 결과적으로 제안된 CNN 기반 SR 가속기는 초당 60프레임(fps)으로 UHD 해상도로의 해상도 업스케일링을 수행한다.
둘째로, 공간적으로 독립적인 계층 융합(SIF) 데이터 흐름을 제안한다. SIF는 융합된 계층 내에서 수용 영역을 공간적으로 단절시킴으로써 계층 파이프라인 아키텍처에서 낭비되는 사이클을 제거하는 데 초점을 맞추고 있다. SIF는 계층 의존성을 효과적으로 제거할 뿐만 아니라 가속기 내에 세분화된 파이프라인을 가능하게 하여 처리속도를 개선한다. 또한, 오차 보상 양자화(ECQ)를 도입하여 기존에 양자화를 적용하기 어려웠던 SR 분야에 공격적인 양자화를 적용한다. ECQ는 양자화에 의해 발생하는 임의의 분포 오차를 스케일링 팩터를 통해 감소시킨다. 또한, 캐시 제한문제를 해결하기 위해 2포트 SRAM 기반의 마스크 기반 캐시 인터리빙 방식을 도입한다. 제안된 가속기는 28nm CMOS 기술로 구현하여 다양한 시스템 성능을 측정 및 평가한다. 결과적으로, 제안된 가속기는 최첨단 가속기에 비해 최대 4.3배 향상된 에너지-면적 효율을 달성한다.

Abstract ▼ AI-Helper

Since the convolution neural network (CNN) has shown remarkable performance in the image processing field of computer vision, it has been employed in many advanced applications such as classification, object detection, and super resolution (SR). Specifically, the CNN-based SR proves the superior upscaling performance of enhancing image resolution compared to conventional techniques. Despite its outstanding potentiality, implementing it in edge devices has many challenges due to the distinctive SR characteristics and degradation of process element utilization. Therefore, two optimization schemes are proposed in this dissertation to design hardware-efficient CNN-based SR accelerators.
First, the delay balance optimization (DBO) is proposed in this dissertation. The DBO focuses on balancing the timing skew of each fused layer by iteratively finding process topology considering both layer workload and memory bandwidth. The DBO is conducted on the proposed circularly shifted layer pipeline (CSLP). By using the fine-tuned access sequence for cache memory, the CSLP saves cache storage and alleviates cache control complexity. The CNN-based SR accelerator is implemented in an FPGA board to verify optimization results. The proposed accelerator improves the utilization of process elements by 5% compared to state-of-the-art works with a small cache storage of 73% reduced than prior work. As a result, the proposed CNN-based SR accelerator supports up-scaling to ultra-HD resolution (UHD) with 60 frames-per-second (fps).
Secondly, the spatially independent layer fusion (SIF) dataflow is proposed. The SIF focuses on removing stall cycles in layer pipelining architecture by spatially disconnecting the region of influence within fused layers. It enables fine-grained pipelining as well as removes layer dependency. In addition, the error compensated quantization (ECQ) is adopted to apply aggressive quantization. The ECQ adopts a scaling factor reducing random-directional distribution error caused by the quantization. Also, a mask-based cache interleaving scheme is adopted in two-port SRAMs to alleviate cache limitations. The proposed accelerator is implemented in 28nm CMOS technology to measure the system performance. As a result, the proposed accelerator achieves the highest energy-area efficiency of 4.3 times higher than state-of-the-art accelerators.

주제어

학위논문 정보

저자	이수민
학위수여기관	Graduate School, Yonsei University
학위구분	국내박사
학과	Department of Electrical and Electronic Engineering
지도교수	Seong-ook Jung
발행연도	2024
총페이지	xi, 98장
키워드	convolution neural network (CNN) super resolution (SR) NPU dataflow energy-efficient accelerator 컨볼루션 신경망 (CNN) 초해상도 복원 기술 (super resolution) 신경망 처리장치 (NPU) 데이 터플로우 에너지 효율적인 신경망 가속기
언어	eng
원문 URL	http://www.riss.kr/link?id=T16910801&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Energy-efficient neural network processing accelerator design for CNN-based super resolution in edge devices 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Energy-efficient neural network processing accelerator design for CNN-based super resolution in edge devices 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

초록 ▼
AI-Helper