[논문]초소형 IoT 장치에 구현 가능한 딥러닝 양자화 기술 분석

김영민; 한경현; 황성운

doi:10.20465/kiots.2023.9.1.009

초소형 IoT 장치에 구현 가능한 딥러닝 양자화 기술 분석
Analysis of Deep learning Quantization Technology for Micro-sized IoT devices 원문보기

사물인터넷융복합논문지 = Journal of internet of things and convergence, v.9 no.1, 2023년, pp.9 - 17

김영민 (가천대학교 IT융합공학과) , 한경현 (홍익대학교 전자전산공학과) , 황성운 (가천대학교 컴퓨터공학과)

초록
AI-Helper

많은 연산량을 가진 딥러닝은 초소형 IoT 장치나 모바일 장치에 구현하기가 어렵다. 최근에는 이러한 장치에서도 딥러닝을 구현할 수 있도록 모델의 연산량을 줄이는 딥러닝 경량화 기술이 소개되었다. 양자화는 연속적인 분포를 가지는 파라미터 값들을 고정된 비트의 이산 값으로 표현하여 모델의 메모리 및 크기 등을 줄여 효율적으로 사용할 수 있는 경량화 기법이다. 그러나 양자화로 인한 이산 값 표현으로 인해 모델의 정확도가 낮아지게 된다. 본 논문에서는 정확도를 개선할 수 있는 다양한 양자화 기술을 소개한다. 먼저 기존 양자화 기술 중 APoT와 EWGS를 선택하여 동일한 환경에서 실험을 통해 결과를 비교 분석하였다. 선택된 기술은 ResNet모델에서 CIFAR-10 또는 CIFAR-100 데이터 세트로 훈련되고 테스트 되었다. 실험 결과 분석을 통해 기존 양자화 기술의 문제점을 파악하고 향후 연구에 대한 방향성을 제시하였다.

Abstract ▼ AI-Helper

Deep learning with large amount of computations is difficult to implement on micro-sized IoT devices or moblie devices. Recently, lightweight deep learning technologies have been introduced to make sure that deep learning can be implemented even on small devices by reducing the amount of computation of the model. Quantization is one of lightweight techniques that can be efficiently used to reduce the memory and size of the model by expressing parameter values with continuous distribution as discrete values of fixed bits. However, the accuracy of the model is reduced due to discrete value representation in quantization. In this paper, we introduce various quantization techniques to correct the accuracy. We selected APoT and EWGS from existing quantization techniques, and comparatively analyzed the results through experimentations The selected techniques were trained and tested with CIFAR-10 or CIFAR-100 datasets in the ResNet model. We found out problems with them through experimental results analysis and presented directions for future research.

주제어

표/그림 (11)

그림 [Fig. 1] Gradient scaling scheme for EWGS. g_f is the gradient to the existing weight value and is equal to the final g≠_w when the quantization gradient g_q is scaled. In particular, in the case of w_f-w_q, it can be seen that g_q is scaled in the +(plus) direction, and vice versa, it is scaled in the -(minus) direction.
그림 [Fig. 3] The DSQ approaches the Tanh function as a rounding function for each epoch, where we can see what is the most approximate to the rounding function.
그림 [Fig. 2] Comparison of PoT and APoT. In PoT, the quantization values are distributed around the mean area (Refer to the red circle), and small quantization intervals can cause rigid resolution problems. On the other hand, in APoT, quantization values are uniformly distributed in the mean and tail.
그림 [Fig. 4] Expression of the rounding approximation function at each temperature(β*) when γ=2. The further the distance from the quantization value, the higher the temperature.
그림 [Fig. 5] Structure of BRECQ quantization
그림 [Fig. 6] First block quantization process
그림 [Fig. 7] Architecture of APoT+EWGS
표 CIFAR10 on ResNet20
그림 [Fig. 8] Training loss and validation accuracy for APoT+EWGS. The validation accuracy of APoT+EWGS significantly decreases as the training loss explodes after a few epoch.
표 CIFAR100 on ResNet32
표 Repeat training on ResNet-20

참고문헌 (25)

Howard, Andrew G., et al. "Mobilenets: Efficient？convolutional neural networks for mobile vision？applications." arXiv preprint arXiv:1704.04861, 2017.
Blalock, Davis, et al. "What is the state of neural？network pruning?." Proceedings of machine learning？and systems 2, pp.129-146, 2020.
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean.？"Distilling the knowledge in a neural network." arXiv？preprint arXiv:1503.02531, 2015.
Itay Hubara, Matthieu Courbariaux, Daniel Soudry,？Ran El-Yaniv, and Yoshua Bengio. "Binarized neural？networks." Advances in neural information processing？systems 29, 2016.
Raghuraman Krishnamoorthi. "Quantizing deep？convolutional networks for efficient inference: A？whitepaper." arXiv preprint arXiv:1806.08342, 2018.
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong？Zhu, Matthew Tang, Andrew Howard, Hartwig Adam,？and Dmitry Kalenichenko. "Quantization and training？of neural networks for efficient integer-arithmetic-only？inference." In Proceedings of the IEEE conference on？computer vision and pattern recognition, pp.2704-2713, 2018.
Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev,？and Paulius Micikevicius. "Integer quantization for？deep learning inference: Principles and empirical？evaluation." arXiv preprint arXiv:2004.09602, 2020.
Song Han, Huizi Mao, and William J Dally. "Deep？compression: Compressing deep neural networks with？pruning, trained quantization and Huffman coding."？arXiv preprint arXiv:1510.00149, 2015.
Zhaohui Yang, Yunhe Wang, Kai Han, Chunjing Xu,？Chao Xu, Dacheng Tao, and Chang Xu. "Searching for？low-bit weights in quantized neural networks."？Advances in neural information processing systems？33, pp.4091-4102, 2020.
Kohei Yamamoto. "Learnable companding quantization？for accurate low-bit neural networks." In Proceedings？of the IEEE/CVF Conference on Computer Vision and？Pattern Recognition, pp.5029-5038, 2021.
Yunchao Gong, Liu Liu, Ming Yang, and Lubomir Bourdev.？"Compressing deep convolutional networks using？vector quantization." arXiv preprint arXiv:1412.6115,？2014.
Yang, Jiwei, et al. "Quantization networks." Proceedings？of the IEEE/CVF Conference on Computer Vision and？Pattern Recognition, pp.7308-7316, 2019.
Gong, Ruihao, et al. "Differentiable soft quantization:？Bridging full-precision and low-bit neural networks."？Proceedings of the IEEE/CVF International？Conference on Computer Vision, pp.4852-4861, 2019.
Kim, Dohyung, Junghyup Lee, and Bumsub Ham.？"Distance-aware quantization." Proceedings of the？IEEE/CVF International Conference on Computer？Vision, pp.5271-5280, 2021.
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and？Yurong Chen. "Incremental network quantization:？Towards lossless cnns with low-precision weights."？arXiv preprint arXiv:1702.03044, 2017.
Yuhang Li, Xin Dong, and Wei Wang. "Additive？powers-of-two quantization: An efficient non-uniform？discretization for neural networks." In International？Conference on Learning Representations, 2020.
Lee, Junghyup, Dohyung Kim, and Bumsub Ham.？"Network quantization with element-wise gradient？scaling." Proceedings of the IEEE/CVF conference on？computer vision and pattern recognition,？pp.6448-6457, 2021.
Yoshua Bengio, Nicholas Leonard, and Aaron Courville.？"Estimating or propagating gradients through？stochastic neurons for conditional computation."？arXiv preprint arXiv:1308.3432, 2013.
Avron, Haim, and Sivan Toledo. "Randomized algorithms？for estimating the trace of an implicit symmetric？positive semi-definite matrix." Journal of the ACM？(JACM), Vol.58, No.2, pp.1-34, 2011.
Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner,？and Daniel Soudry. "Improving post training neural？quantization: Layer-wise calibration and integer？programming." arXiv preprint arXiv:2006.10518, 2020.
Markus Nagel, Mart van Baalen, Tijmen Blankevoort,？and Max Welling. "Data-free quantization through？weight equalization and bias correction." In？Proceedings of the IEEE/CVF International？Conference on Computer Vision, pp.1325-1334, 2019.
Li, Yuhang, et al. "Brecq: Pushing the limit of？post-training quantization by block reconstruction."？arXiv preprint arXiv:2102.05426, 2021.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian？Sun. "Deep residual learning for image recognition."？In Proceedings of the IEEE conference on computer？vision and pattern recognition, pp.770-778, 2016.
Alex Krizhevsky, Geoffrey Hinton, et al. "Learning？multiple layers of features from tiny images." 2009.
Lee, Junghyup, et al. "Sfnet: Learning object-aware？semantic correspondence." Proceedings of the？IEEE/CVF Conference on Computer Vision and？Pattern Recognition, pp.2278-2287, 2019.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증