[논문]모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러

나용석; 손현욱; 김형원

doi:10.6109/jkiice.2022.26.3.355

모바일 디바이스를 위한 소형 CNN 가속기의 마이크로코드 기반 컨트롤러
Microcode based Controller for Compact CNN Accelerators Aimed at Mobile Devices 원문보기

한국정보통신학회논문지 = Journal of the Korea Institute of Information and Communication Engineering, v.26 no.3, 2022년, pp.355 - 366

나용석 (Department of Electronics Engineering, Chungbuk National University) , 손현욱 (Department of Electronics Engineering, Chungbuk National University) , 김형원 (Department of Electronics Engineering, Chungbuk National University)

초록
AI-Helper

본 논문은 프로그램 가능한 구조를 사용하여 재구성이 가능하고 저 전력 초소형의 장점을 모두 제공하는 인공지능 가속기를 위한 마이크로코드 기반 뉴럴 네트워크 가속기 컨트롤러를 제안한다. 대상 가속기가 다양한 뉴럴 네트워크 모델을 지원하도록 마이크로코드 컴파일러를 통해 뉴럴 네트워크 모델을 마이크로코드로 변환하여 가속기의 메모리 접근과 모든 연산기를 제어할 수 있다. 200MHz의 System Clock을 기준으로 설계하였으며, YOLOv2-Tiny CNN model을 구동하도록 컨트롤러를 구현하였다. 객체 감지를 위한 VOC 2012 dataset 추론용 컨트롤러를 구현할 경우 137.9ms/image, mask 착용 여부 감지를 위한 mask detection dataset 추론용으로 구현할 경우 99.5ms/image의 detection speed를 달성하였다. 제안된 컨트롤러를 탑재한 가속기를 실리콘칩으로 구현할 때 게이트 카운트는 618,388이며, 이는 CPU core로서 RISC-V (U5-MC2)를 탑재할 경우 대비 약 65.5% 감소한 칩 면적을 제공한다.

Abstract ▼ AI-Helper

This paper proposes a microcode-based neural network accelerator controller for artificial intelligence accelerators that can be reconstructed using a programmable architecture and provide the advantages of low-power and ultra-small chip size. In order for the target accelerator to support various neural network models, the neural network model can be converted into microcode through microcode compiler and mounted on accelerator to control the operators of the accelerator such as datapath and memory access. While the proposed controller and accelerator can run various CNN models, in this paper, we tested them using the YOLOv2-Tiny CNN model. Using a system clock of 200 MHz, the Controller and accelerator achieved an inference time of 137.9 ms/image for VOC 2012 dataset to detect object, 99.5ms/image for mask detection dataset to detect wearing mask. When implementing an accelerator equipped with the proposed controller as a silicon chip, the gate count is 618,388, which corresponds to 65.5% reduction in chip area compared with an accelerator employing a CPU-based controller (RISC-V).

주제어

표/그림 (21)

그림 Fig. 1 Microcode based Controller Architecture
표 Table. 1 Microcode's Field Format
표 Table. 2 Opcode and Functions
그림 Fig. 2 State Diagram in Microcode based Controller
그림 Fig. 3 Microcode Compiler Architecture
그림 Fig. 4 Example of executing the object detection's reference on the FPGA test platform with microcode generated by compiling the YOLOv2-Tiny CNN model for VOC Dataset
그림 Fig. 5 Example of executing the object detection's reference on the FPGA test platform with microcode generated by compiling the YOLOv2-Tiny CNN model for mask detection Dataset
표 Table. 3 Comparison of FPGA based implementation
그림 Fig. 6 Chip Layout
표 Table. 5 Gate Count
표 Table. 4 Comparison of ASIC based implementation
그림 Fig. 7 Accelerator Size and Gate Count
그림 Fig. 8 Examples of Microcode from 0 to 18
그림 Fig. 9 Examples of Microcode from 19 to 37
그림 Fig. 10 Examples of Microcode from 38 to 56
그림 Fig. 11 Examples of Microcode from 57 to 75
그림 Fig. 12 Examples of Microcode from 76 to 94
그림 Fig. 13 Examples of Microcode from 95 to 113
그림 Fig. 14 Examples of Microcode from 114 to 132
그림 Fig. 15 Examples of Microcode from 133 to 151
그림 Fig. 16 Examples of Microcode from 152 to 171

참고문헌 (15)

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016.
F. Ge, N. Wu, H. Xiao, Y. Zhang, and F. Zhou, "Compact Convolutional Neural Network Accelerator for IoT Endpoint SoC," Electronics, vol. 8, iss. 5, 2019.
S. Kim, J. Lee, S. Kang, J. Lee, and H. Yoo(2020), "A Power-Efficient CNN Accelerator With Similar Feature Skipping for Face Recognition in Mobile Devices," IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS, vol. 67, iss. 4, pp. 1181-1193, Apr. 2020.

상세보기
Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, "Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks," IEEE J. Solid-State Circuits, vol. 52, iss. 1, pp. 127-138, Jan. 2017.

상세보기
X. Zhou, L. Zhang, C. Guo, X. Yin, and C. Zhuo, "A Convolutional Neural Network Accelerator Architecture with Fine-Granular Mixed Precision Configurability," in 2020 IEEE International Symposium on Circuits and Systems, pp. 1-5, 2020.
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. Richard Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, "Indatacenter performance analysis of a tensor processing unit," in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture, pp. 1-12, Jun. 2017.
L. Zhang, X. Zhou, and C. Guo, "A CNN ACCELERATOR WITH EMBEDDED RISC-V CONTROLLERS," in 2021 China Semiconductor Technology International Conference (CSTIC), pp. 1-3, Mar. 2021.
Q. Zhao, Y. Nakahara, M. Amagasaki, M. Iida, and T. Yoshida, "A Microcode-based Control Unit for Deep Learning Processors," in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 139-142, May. 2020.
H. Son, Y. Na, T. Kim, A. A. Al-Hamid, and H. Kim, "CNN Accelerator with Minimal On-Chip Memory Based on Hierarchical Array," in 2021 18th International SoC Design Conference (ISOCC), pp. 411-412, Oct. 2021.
N. Tidala, "High Performance Network On Chip using AXI4 protocol interface on an FPGA," in Proceedings of the 2nd International conference on Electronics, Communication and Aerospace Technology (ICECA 2018), pp. 1647-1651, Mar. 2018.
AMBA 4 AXI4 Stream Protocol, Version 1.0.
K. KRISHNAIAH and Y. RAVINDER, "Design of Memory controller with AXI Bus interface," International Journal of Engineering Science and Generic Research (IJESAR), vol. 2, no. 5, Oct. 2016.
K. VeenaH and A. Ali, "Design and Implementation of High Speed DDR SDRAM Controller on FPGA," International Journal of Engineering Research & Technology (IJERT), vol. 4, iss. 7, Jul. 2015.
W. Lin, D. Tsai, L. Tang, C. Hsieh, C. Chou, P. Chang, and L. Hsu, "ONNC: A Compilation Framework Connecting ONNX to Proprietary Deep Learning Accelerators," in 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), pp. 214-218, Mar. 2019.
SiFive, Inc.: SiFive U54 Manual, v21.G2.01.00 (2021).

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증