[논문]ARM Cortex-M0를 활용한 모듈 기반 CNN 가속기의 SoC 구현에 관한 연구

전성호

ARM Cortex-M0를 활용한 모듈 기반 CNN 가속기의 SoC 구현에 관한 연구 원문보기

전성호 (동의대학교 대학원 인공지능학과 국내석사)

초록 ▼
AI-Helper

최근 인공지능(Artificial Intelligence :AI)의 발전과 사물인터넷(Internet of Things :IoT) 기기의 도입, 엣지 컴퓨팅을 통해 단말 디바이스에 AI 기술을적용하기 위한 연구가 활발히 이루어지고 있다. AI 기술 중 CNN(Convolutional Neural Network)은 비전(Vision) 분야에서 높은 성능을 도출한다. 하지만 CNN 연산의 경우 다차원 이미지 데이터를 기반으로 하므로 고성능 컴퓨팅이 요구된다.
엣지(Edge) AI는 지연시간, 비용, 전력 소모, 보안 등의 장점이 있어 웨어러블 디바이스 및 말단 IoT 기기에 적용이 가능하다. 엣지 AI 시스템을 구현하기 위해서는 하드웨어 크기 및 전력 최소화, 그리고 처리속도 향상이 필수적이며, 이를 위한 CNN 전용 하드웨어 가속기 개발이 활발히 이루어지고 있다. AI 연산뿐만 아니라 프로세서를 사용한 SoC(System-on-Chip) 기반 엣지 AI는 멀티칩 시스템 대비 저전력, 낮은 생산단가 그리고 신뢰성 등의 장점이 있어 필수적인 최적화 구조이다.
본 논문에서는 단말 디바이스에서 CNN 연산을 처리하기 위해 CNN 전용 하드웨어 가속기를 구현하고 ASIC(Application Specific Integrated Circuit) 설계를 수행하였으며, ARM Cortex-M0를 사용하여 SoC AI 구현을 위한 연구를 수행하였다. 이를 위해 ResNet-18을 타겟 CNN으로 선정하고 데이터셋 수집 및 서버 환경에서 PyTorch를 사용하여 학습을 수행하였으며, 가속기 설계 시 검증을 위해 C 기반 검증 모델을 구현하였다. CNN 전용 하드웨어 가속기는 Xilinx 사의 SoC형 FPGA 보드인 ZC706 환경에서 구현되었으며, CNN의 주요 연산 모듈 단위로 설계하고 모듈 제어 명령어를 통해 연산 모듈을한번에 하나씩 활성화하여 계층을 순차적으로 구동함으로써 네트워크 재구성이 가능하도록 구현하였다. 모듈 기반 CNN 가속기는 ASIC 설계 시 모듈 제어 명령어 전송을 위해 SPI(Serial Periphral Interface) 인터페이스를 사용하여 검증하였으며, Samsung 28-nm MPW(Multi-Project Wafer)를 통해 ASIC으로 제작되었다. 이후, 하드웨어 크기 및 전력 최소화, 그리고 신뢰성 향상을 위해 SoC AI 구현을 목표로 하였으며, ARM 프로세서 중 최소 크기 및 전력 소모를 가지는 ARM Cortex-M0를 사용하여 SoC AI로 구현하였다. SoC형 모듈 기반 CNN 가속기는 ARM, 가속기뿐만 아니라, DMA(Direct Memory Access), UART, SPI, 모듈 제어 명령어 시스템을 추가하여 통합 환경 구축을 수행하였다.
모듈 기반 CNN 가속기는 한국인정기구(KOLAS)를 통해 객관적인 정확도 평가를 수행하였으며 100번의 테스트 결과, 추론 정확도 96.2%를 가짐을 확인하였다. MPW 수행을 통한 ASIC은 50 MHz 구동 기준으로 60 GOPS의 처리속도를 가지며, 27.35 mW의 소모 전력, 2194 GOPS/W 임을 확인하였다. SoC형 모듈 기반 CNN 가속기는 ARM Cortex-M0 기준으로 하위시스템을 Xilinx Vivado 시뮬레이션을 통해 동작 검증을 완료하였으며, DMA를 추가함으로써 모듈 기반 CNN 가속기 대비 데이터 전송 속도가 약 14배 향상되는 것을 확인하였다.

Abstract ▼ AI-Helper

Recent advances in artificial intelligence (AI), the introduction of
Internet of Things (IoT) devices, and the application of AI technology to
terminal devices through edge computing are being actively researched.
Among AI technologies, CNN (Convolutional Neural Network) derives
high performance in the field of vision. However, since CNN operation is
based on multidimensional image data, high-performance computing is
required.
Edge AI has advantages such as latency, cost, power consumption, and
security, so it can be applied to wearable devices and edge IoT devices.
In order to implement an edge AI system, it is essential to minimize the
size and power of the hardware, and to improve the processing speed. In
addition to AI computation, SoC (System-on-Chip)-based edge AI using
a processor has advantages such as low power, low production cost, and
reliability compared to multi-chip systems, making it an essential
optimization structure.
In this paper, we implemented a CNN-specific hardware accelerator to
process CNN operations in a terminal device, performed an ASIC
(Application Specific Integrated Circuit) design, and conducted a study
for SoC AI implementation using ARM Cortex-M0. To this end,
ResNet-18 was selected as the target CNN, and training was performed
using PyTorch in the dataset collection and server environment, and a
C-based verification model was implemented for verification when
designing the accelerator. The CNN dedicated hardware accelerator was
implemented in the ZC706 environment, which is an SoC-type FPGA
board of Xilinx, designed as a unit of CNN's main operation module and
activated one by one through module control commands to enable
network reconfiguration by sequentially driving the layers. implemented.
The module-based CNN accelerator was verified using the SPI (Serial
Periphral Interface) interface to transmit module control commands when
designing the ASIC, and was manufactured as an ASIC through
Samsung 28-nm MPW (Multi-Project Wafer). Afterwards, the goal was
to implement SoC AI to minimize hardware size and power, and to
improve reliability, and implemented as SoC AI using ARM Cortex-M0,
which has the smallest size and power consumption among ARM
processors. For the SoC-type module-based CNN accelerator, an
integrated environment was built by adding DMA (Direct Memory
Access), UART, SPI, and module control command system as well as
ARM and accelerator.
The module-based CNN accelerator performed objective accuracy
evaluation through the Korea Accreditation Service (KOLAS), and it was
confirmed that it had an inference accuracy of 96.2% as a result of 100
tests. It was confirmed that the ASIC through MPW operation has a
processing speed of 60 GOPS based on 50 MHz driving, power
consumption of 27.35 mW, and 2194 GOPS/W. The SoC-type
module-based CNN accelerator has completed its operation verification
through Xilinx Vivado simulation of the subsystem based on ARM
Cortex-M0, and it was confirmed that the data transmission speed was
improved by about 14 times compared to the module-based CNN
accelerator by adding DMA.

주제어

학위논문 정보

저자	전성호
학위수여기관	동의대학교 대학원
학위구분	국내석사
학과	인공지능학과
지도교수	옥승호
발행연도	2022
총페이지	87
키워드	SoC AI design On-Device AI CNN accelerator re-configurable CNN architecture ARM cortex-M0 Verilog-HDL MPW ASIC
언어	kor
원문 URL	http://www.riss.kr/link?id=T16668086&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

ARM Cortex-M0를 활용한 모듈 기반 CNN 가속기의 SoC 구현에 관한 연구 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

ARM Cortex-M0를 활용한 모듈 기반 CNN 가속기의 SoC 구현에 관한 연구 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

초록 ▼
AI-Helper