[논문]An Energy-Efficient GAN Accelerator With On-Chip Training for Domain-Specific Optimization

Kim, Soyeon; Kang, Sanghoon; Han, Donghyeon; Kim, Sangjin; Kim, Sangyeob; Yoo, Hoi-Jun

doi:10.1109/jssc.2021.3094469

[해외논문] An Energy-Efficient GAN Accelerator With On-Chip Training for Domain-Specific Optimization

IEEE journal of solid-state circuits, v.56 no.10, 2021년, pp.2968 - 2980

Kim, Soyeon (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Kang, Sanghoon (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Han, Donghyeon (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Kim, Sangjin (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Kim, Sangyeob (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea) , Yoo, Hoi-Jun (Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)

Abstract ▼ AI-Helper

Generative adversarial networks (GANs) consist of multiple deep neural networks cooperating and competing with each other. Due to their complex architectures and large feature map sizes, training GANs requires a huge amount of computations. Moreover, instance normalization (IN) layers in GANs dramatically increase the external memory access (EMA). However, retraining GANs with user-specific data is critical on mobile devices because the pre-trained model outputs distorted images under user-specific conditions. This article proposes a GAN training accelerator to enable energy-efficient domain-specific optimization of GAN with user’s local data. Selective layer retraining (SELRET) picks out layers that are effective in enhancing the quality of the retrained model. Without image quality degradation, the SELRET reduces the required computation by 69%. Moreover, reordering layers for instance normalization (ROLIN) is proposed to reduce the EMA of intermediate data. Through the implementation of the proposed architecture, which splits and reorders the IN layers, 38.7% and 32.2% of overall EMA reduction are achieved in the forward propagation (FP) stage and the error propagation (EP) stage, respectively. The proposed processor is fabricated in a 65-nm CMOS process, showing 0.38-TFLOPS/W energy efficiency. The chip can retrain a face modification GAN with a custom dataset of 256 $\times $ 256 images over 100 epochs under 30 s while only consuming 274 mW. Compared to the previous FPGA implementation, this work improved the retraining performance and energy efficiency by 2 $\times $ and 39 $\times $, respectively. As a result, the proposed accelerator enables GAN’s domain-specific optimization on a mobile platform.

참고문헌 (28)

arXiv 2002 10964 Freeze the discriminator: A simple baseline for fine-tuning GANs mo 2020
arXiv 1607 08022 Instance normalization: The missing ingredient for fast stylization ulyanov 2016
arXiv 1502 03167 Batch normalization: Accelerating deep network training by reducing internal covariate shift ioffe 2015
10.1109/ASP-DAC47756.2020.9045214
10.1109/FCCM.2018.00019
10.1109/ICCAD45719.2019.8942169
10.1109/CVPR.2016.465
Han, Donghyeon, Lee, Jinsu, Lee, Jinmook, Yoo, Hoi-Jun. A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking Application. IEEE transactions on circuits and systems. a publication of the IEEE Circuits and Systems Society. I, Regular papers, vol.66, no.5, 1794-1804.

상세보기
arXiv 1710 10196 Progressive growing of GANs for improved quality, stability, and variation karras 2017
10.1109/CVPR.2009.5206848
10.1109/ICFPT47387.2019.00011
10.1007/978-3-030-01231-1_26
10.1109/VLSIC.2018.8502276
10.1109/CVPR42600.2020.00507
10.1109/CVPR.2017.19
Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR) MaskGAN: Towards diverse and interactive facial image manipulation lee 2020 5549
10.1109/ICCV.2015.425
IEEE Int Solid-State Circuits Conf (ISSCC) Dig Tech Papers 7.4 GANPU: A 135TFLOPS/W multi-DNN training processor for GANs with speculative dual-sparsity exploitation kang 0 140
10.1109/CVPR.2017.632
Roohi, Arman, Sheikhfaal, Shadi, Angizi, Shaahin, Fan, Deliang, DeMara, Ronald F. ApGAN: Approximate GAN for Robust Low Energy Learning From Imprecise Components. IEEE transactions on computers, vol.69, no.3, 349-360.

상세보기
Proc Adv Neural Inf Process Syst Generative adversarial nets goodfello 2014 2672
10.1109/ICCV.2017.244
10.1109/MICRO.2016.7783725
10.1109/ISSCC.2019.8662302
10.1145/3079856.3080246
10.1109/CVPR.2018.00916
Cyclone V Device Datasheet 2019
Proc IEEE Asian Solid-State Circuits Conf A-SSCC An energy-efficient GAN accelerator with on-chip training for domain specific optimization kim 2020 1

LOADING...

활용도 분석정보

상세보기

다운로드

내보내기

활용도 Top5 논문

해당 논문의 주제분야에서 활용도가 높은 상위 5개 콘텐츠를 보여줍니다.
더보기 버튼을 클릭하시면 더 많은 관련자료를 살펴볼 수 있습니다.

원문 URL 링크

DOI : 10.1109/JSSC.2021.3094469
IEEE : 저널 > 논문

*원문 PDF 파일 및 링크정보가 존재하지 않을 경우 KISTI DDS 시스템에서 제공하는 원문복사서비스를 사용할 수 있습니다.

저작권 관리 안내

내보내기 메뉴

내보내기 구분

파일저장
인쇄
메일전송

구성항목

기본정보
상세정보

관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관

저장형식

Text(ASCII format)
Excel format
RefWorks Direct Export
RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley

메일정보

받는사람 (필수): @
보내는사람 (선택): @
제목
내용: KISTI 검색결과 이메일 서비스

안내

총 건의 자료가 검색되었습니다.

다운받으실 자료의 인덱스를 입력하세요. (1-10,000)

검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다.

데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요)

다운로드 파일은 UTF-8 형태로 저장됩니다.
파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오.

Text(ASCII format)
Excel format

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

AI-Helper ※ AI-Helper는 을 사용합니다.

AI-Helper

안녕하세요, AI-Helper입니다. 좌측 "선택된 텍스트"에서 텍스트를 선택하여 요약, 번역, 용어설명을 실행하세요.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.

연합인증