Kim, Soyeon
(Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)
,
Kang, Sanghoon
(Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)
,
Han, Donghyeon
(Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)
,
Kim, Sangjin
(Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)
,
Kim, Sangyeob
(Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)
,
Yoo, Hoi-Jun
(Korea Advanced Institute of Science and Technology (KAIST), School of Electrical Engineering, Daejeon, South Korea)
Generative adversarial networks (GANs) consist of multiple deep neural networks cooperating and competing with each other. Due to their complex architectures and large feature map sizes, training GANs requires a huge amount of computations. Moreover, instance normalization (IN) layers in GANs dramat...
Generative adversarial networks (GANs) consist of multiple deep neural networks cooperating and competing with each other. Due to their complex architectures and large feature map sizes, training GANs requires a huge amount of computations. Moreover, instance normalization (IN) layers in GANs dramatically increase the external memory access (EMA). However, retraining GANs with user-specific data is critical on mobile devices because the pre-trained model outputs distorted images under user-specific conditions. This article proposes a GAN training accelerator to enable energy-efficient domain-specific optimization of GAN with user’s local data. Selective layer retraining (SELRET) picks out layers that are effective in enhancing the quality of the retrained model. Without image quality degradation, the SELRET reduces the required computation by 69%. Moreover, reordering layers for instance normalization (ROLIN) is proposed to reduce the EMA of intermediate data. Through the implementation of the proposed architecture, which splits and reorders the IN layers, 38.7% and 32.2% of overall EMA reduction are achieved in the forward propagation (FP) stage and the error propagation (EP) stage, respectively. The proposed processor is fabricated in a 65-nm CMOS process, showing 0.38-TFLOPS/W energy efficiency. The chip can retrain a face modification GAN with a custom dataset of 256 $\times $ 256 images over 100 epochs under 30 s while only consuming 274 mW. Compared to the previous FPGA implementation, this work improved the retraining performance and energy efficiency by 2 $\times $ and 39 $\times $, respectively. As a result, the proposed accelerator enables GAN’s domain-specific optimization on a mobile platform.
Generative adversarial networks (GANs) consist of multiple deep neural networks cooperating and competing with each other. Due to their complex architectures and large feature map sizes, training GANs requires a huge amount of computations. Moreover, instance normalization (IN) layers in GANs dramatically increase the external memory access (EMA). However, retraining GANs with user-specific data is critical on mobile devices because the pre-trained model outputs distorted images under user-specific conditions. This article proposes a GAN training accelerator to enable energy-efficient domain-specific optimization of GAN with user’s local data. Selective layer retraining (SELRET) picks out layers that are effective in enhancing the quality of the retrained model. Without image quality degradation, the SELRET reduces the required computation by 69%. Moreover, reordering layers for instance normalization (ROLIN) is proposed to reduce the EMA of intermediate data. Through the implementation of the proposed architecture, which splits and reorders the IN layers, 38.7% and 32.2% of overall EMA reduction are achieved in the forward propagation (FP) stage and the error propagation (EP) stage, respectively. The proposed processor is fabricated in a 65-nm CMOS process, showing 0.38-TFLOPS/W energy efficiency. The chip can retrain a face modification GAN with a custom dataset of 256 $\times $ 256 images over 100 epochs under 30 s while only consuming 274 mW. Compared to the previous FPGA implementation, this work improved the retraining performance and energy efficiency by 2 $\times $ and 39 $\times $, respectively. As a result, the proposed accelerator enables GAN’s domain-specific optimization on a mobile platform.
참고문헌 (28)
arXiv 2002 10964 Freeze the discriminator: A simple baseline for fine-tuning GANs mo 2020
arXiv 1607 08022 Instance normalization: The missing ingredient for fast stylization ulyanov 2016
arXiv 1502 03167 Batch normalization: Accelerating deep network training by reducing internal covariate shift ioffe 2015
10.1109/ASP-DAC47756.2020.9045214
10.1109/FCCM.2018.00019
10.1109/ICCAD45719.2019.8942169
10.1109/CVPR.2016.465
Han, Donghyeon, Lee, Jinsu, Lee, Jinmook, Yoo, Hoi-Jun.
A Low-Power Deep Neural Network Online Learning Processor for Real-Time Object Tracking Application.
IEEE transactions on circuits and systems. a publication of the IEEE Circuits and Systems Society. I, Regular papers,
vol.66,
no.5,
1794-1804.
arXiv 1710 10196 Progressive growing of GANs for improved quality, stability, and variation karras 2017
10.1109/CVPR.2009.5206848
10.1109/ICFPT47387.2019.00011
10.1007/978-3-030-01231-1_26
10.1109/VLSIC.2018.8502276
10.1109/CVPR42600.2020.00507
10.1109/CVPR.2017.19
Proc IEEE/CVF Conf Comput Vis Pattern Recognit (CVPR) MaskGAN: Towards diverse and interactive facial image manipulation lee 2020 5549
10.1109/ICCV.2015.425
IEEE Int Solid-State Circuits Conf (ISSCC) Dig Tech Papers 7.4 GANPU: A 135TFLOPS/W multi-DNN training processor for GANs with speculative dual-sparsity exploitation kang 0 140
10.1109/CVPR.2017.632
Roohi, Arman, Sheikhfaal, Shadi, Angizi, Shaahin, Fan, Deliang, DeMara, Ronald F.
ApGAN: Approximate GAN for Robust Low Energy Learning From Imprecise Components.
IEEE transactions on computers,
vol.69,
no.3,
349-360.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.