[논문]HNPU: An Adaptive DNN Training Processor Utilizing Stochastic Dynamic Fixed-Point and Active Bit-Precision Searching

Han, Donghyeon; Im, Dongseok; Park, Gwangtae; Kim, Youngwoo; Song, Seokchan; Lee, Juhyoung; Yoo, Hoi-Jun

doi:10.1109/jssc.2021.3066400

[해외논문] HNPU: An Adaptive DNN Training Processor Utilizing Stochastic Dynamic Fixed-Point and Active Bit-Precision Searching

IEEE journal of solid-state circuits, v.56 no.9, 2021년, pp.2858 - 2869

Han, Donghyeon (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Im, Dongseok (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Park, Gwangtae (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Kim, Youngwoo (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Song, Seokchan (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Lee, Juhyoung (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Yoo, Hoi-Jun (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Kor)

Abstract ▼ AI-Helper

This article presents HNPU, which is an energy-efficient deep neural network (DNN) training processor by adopting algorithm-hardware co-design. The HNPU supports stochastic dynamic fixed-point representation and layer-wise adaptive precision searching unit for low-bit-precision training. It additionally utilizes slice-level reconfigurability and sparsity to maximize its efficiency both in DNN inference and training. Adaptive bandwidth reconfigurable accumulation network enables reconfigurable DNN allocation and maintains its high core utilization even in various bit-precision conditions. Fabricated in a 28-nm process, the HNPU accomplished at least $5.9\times $ higher energy efficiency and $2.5\times $ higher area efficiency in actual DNN training compared with the previous state-of-the-art on-chip learning processors.

참고문헌 (25)

10.1109/ISSCC.2019.8662302
10.1109/ISSCC19947.2020.9062989
10.1109/VLSICircuits18222.2020.9162795
Tu, Fengbin, Wu, Weiwei, Wang, Yang, Chen, Hongjiang, Xiong, Feng, Shi, Man, Li, Ning, Deng, Jinyi, Chen, Tianbao, Liu, Leibo, Wei, Shaojun, Xie, Yuan, Yin, Shouyi. Evolver: A Deep Learning Processor With On-Device Quantization–Voltage–Frequency Tuning. IEEE journal of solid-state circuits, vol.56, no.2, 658-673.

상세보기
Tensor Processing Unit—Second Generation (TPU-v2) 0
Proc 31st Int Conf Neural Inf Process Syst (NIPS) Flexpoint: An adaptive numerical format for efficient training of deep neural networks köster 2017 1740
10.1109/VLSICircuits18222.2020.9162917
Proc 32nd Int Conf Neural Inf Process Syst (NIPS) Training deep neural networks with 8-bit floating point numbers wang 2018 7686
10.23919/VLSIC.2019.8778006
Choi, Seungkyu, Sim, Jaehyeong, Kang, Myeonggu, Choi, Yeongjae, Kim, Hyeonuk, Kim, Lee-Sup. An Energy-Efficient Deep Convolutional Neural Network Training Accelerator for In Situ Personalization on Smart Devices. IEEE journal of solid-state circuits, vol.55, no.10, 2691-2702.

상세보기
10.1109/ISSCC.2018.8310261
10.1109/ASSCC.2017.8240260
10.1109/VLSIC.2018.8502404
10.1109/ISSCC.2018.8310262
Han, Donghyeon, Lee, Jinsu, Lee, Jinmook, Yoo, Hoi-Jun. A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking Application. IEEE transactions on circuits and systems. a publication of the IEEE Circuits and Systems Society. I, Regular papers, vol.66, no.5, 1794-1804.

상세보기
Proc NIPS Workshop Private Multi-Party Mach Learn Federated learning: Strategies for improving communication efficiency kone?ný 2016 1
10.1109/ISSCC.2017.7870350
10.1109/WACV45572.2020.9093437
Chen, Yu-Hsin, Krishna, Tushar, Emer, Joel S., Sze, Vivienne. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE journal of solid-state circuits, vol.52, no.1, 127-138.

상세보기
Yin, Shihui, Seo, Jae-Sun. A 2.6 TOPS/W 16-Bit Fixed-Point Convolutional Neural Network Learning Processor in 65-nm CMOS. IEEE solid-state circuits letters, vol.3, 13-16.

상세보기
arXiv 1606 06160 [cs] DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients zhou 2016
Proc 32nd Int Conf Int Conf Mach Learn Deep learning with limited numerical precision gupta 2015 37 1737
10.1109/ICDMW.2019.00037
10.1109/IJCNN.2017.7966159
Tesla V100 0

LOADING...

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

원문 URL 링크

DOI : 10.1109/JSSC.2021.3066400
IEEE : 저널 > 논문

*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.

저작권 관리 안내

내보내기 메뉴

내보내기 구분

파일저장
인쇄
메일전송

구성항목

기본정보
상세정보

관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관

저장형식

Text(ASCII format)
Excel format
RefWorks Direct Export
RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley

메일정보

받는사람 (필수): @
보내는사람 (선택): @
제목
내용: KISTI 검색결과 이메일 서비스

안내

총 건의 자료가 검색되었습니다.

다운받으실 자료의 인덱스를 입력하세요. (1-10,000)

검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다.

데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요)

다운로드 파일은 UTF-8 형태로 저장됩니다.
파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오.

Text(ASCII format)
Excel format

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

AI-Helper ※ AI-Helper는 을 사용합니다.

AI-Helper

안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

연합인증