[논문]Edge AI의 실시간 객체탐지 모델의 추론성능 향상을 위한 모델 최적화에 관한 연구

안성열

Edge AI의 실시간 객체탐지 모델의 추론성능 향상을 위한 모델 최적화에 관한 연구
A Study on the Model Optimization for Improving Inference Performance of Real-time Object Detection Models in Edge AI 원문보기

안성열 (전남대학교 디지털미래융합서비스협동과정 국내석사)

초록 ▼
AI-Helper

4차 산업혁명의 핵심기술인 AIoT(AI+IoT)와 DX(Digital Transformation) 가속화로 인해 생성된 대량의 데이터를 클라우드 컴퓨팅을 통해 처리하기엔 네트워크 대역폭 부하, 지연시간 증가, 보안 취약 등의 문제점이 존재한다. 이러한 문제를 해결하기 위해 ...

4차 산업혁명의 핵심기술인 AIoT(AI+IoT)와 DX(Digital Transformation) 가속화로 인해 생성된 대량의 데이터를 클라우드 컴퓨팅을 통해 처리하기엔 네트워크 대역폭 부하, 지연시간 증가, 보안 취약 등의 문제점이 존재한다. 이러한 문제를 해결하기 위해 엣지 컴퓨팅(Edge Computing)이 새로운 패러다임으로 떠올랐다. 엣지 컴퓨팅은 데이터를 로컬에서 처리하는 방식으로, 중앙집중식 클라우드 컴퓨팅의 문제를 보완할 수 있으며, 비용과 전력 소비 측면에서 효율적인 장점을 가지고 있다. 또한, 엣지 컴퓨팅과 AI를 결합한 Edge AI 디바이스의 가속화 솔루션을 통해 AI 학습과 추론을 수행할 수 있는 환경을 제공한다. 그러나, 컴퓨터 비전의 실시간 객체탐지 모델과 같은 복잡한 연산을 요구하는 딥러닝 모델을 Edge AI에서 수행하기엔 부족한 컴퓨팅 리소스로 인한 한계점이 존재한다.
이에 따라 본 연구에서는 Edge AI에서 실시간 객체탐지 모델의 추론성능 향상을 위해 모델 최적화에 관한 연구를 수행하였다. 모델 최적화를 위해 훈련 후 모델 양자화 압축 방법을 활용하였으며, Tensorflow, TF-Lite, TensorRT 프레임워크 변환 및 이를 활용한 FP16 데이터 형식으로의 양자화를 수행하였다. 더불어, Edge AI의 메모리 효율성과 컴퓨팅 리소스를 보다 효율적으로 사용하기 위해 FP16보다 더 낮은 비트인 TensorRT INT8 데이터형식의 양자화를 수행하였으며, 최적화된 모델의 성능을 평가하기 위해 GPU 사용률, FPS, Latency, Accuracy, Size로 이루어진 평가지표를 정의하여 성능을 평가하고 결과를 고찰하였다.
최적화된 모델의 성능평가 결과는 Tensorflow 모델과 TF-Lite 모델, 그리고 TF-Lite FP16 모델은 높은 지연시간과 낮은 FPS로 인해 실시간 객체탐지 모델의 최적화 방안으로 활용할 수 없다고 판단되었다. TensorRT FP16 모델은 모든 평가지표에서 성능이 향상된 결과를 나타내었으며, GPU 사용률은 약 1.58배 향상된 59.08%를, 평균 FPS는 약 2.04배 향상된 14.13을, Latency는 약 1.86배 향상된 67.61ms를 나타내었다. 또한, 모든 클래스에 대해 정확도가 상승하였으며, 모델의 크기 또한 약 1.9배 압축되어 성능이 향상됨을 확인할 수 있었다. TensorRT INT8 모델은 3개의 평가지표에서 가장 높은 성능향상 폭을 나타냈다. GPU 사용률은 약 1.98배 향상된 47.26%를, FPS는 약 2.59배 향상된 17.87을, Latency는 약 2.33배 향상된 53.95ms를 나타내었다. 그러나 객체탐지를 수행하였을 때 일부 클래스에 대해 추론이 불가능 하였으며, 정확도가 매우 하락하였다. 성능이 저하된 모델에 대해 고찰을 수행하여 TF-Lite 모델과 TF-Lite FP16 모델은 TF-Lite의 GPU Delegate를 지원하는 임베디드 디바이스에서는 활용될 수 있다고 판단된다. TensorRT INT8 양자화 방법은 일부 정확도에 대해 준수한 성능을 보여주었으며, 연구결과를 고찰하였을 때 정확도와 신뢰도가 높은 모델을 사용하거나 특정 계층에서 입력 데이터와 수치의 분포를 측정하여 데이터 왜곡을 억제한다면 TensorRT INT8 모델을 최적화 방안으로 활용할 수 있을 것으로 판단된다.
Edge AI에서 실시간 객체탐지 모델의 추론성능 향상을 위한 모델 최적화 방안으로 TensorRT 엔진을 활용한 FP16 양자화 방법이 가장 효율적임을 확인하였으며, 메모리 효율성과 컴퓨팅 리소스를 효율적으로 사용하기 위해 FP16보다 더 낮은 비트로 양자화를 수행한 TensorRT INT8 양자화 방법을 개선한다면 Edge AI의 추론성능 향상을 위한 최적화 방안으로 활용될 수 있을 것으로 기대된다.

Abstract ▼ AI-Helper

The advent of the Fourth Industrial Revolution is marked by the proliferation of AIoT (AI+IoT) and DX (Digital Transformation), integral technologies of this era. These technologies have accelerated the generation of colossal volumes of data. However, processing this data via cloud computing has presented challenges such as increased network bandwidth load, latency, and potential security vulnerabilities. Edge computing has emerged as a viable paradigm, offering a decentralized method for local data processing. This approach not only mitigates the inherent issues associated with centralized cloud computing and also proves efficient in terms of cost and energy consumption. The convergence of edge computing and artificial intelligence, termed Edge AI, introduces acceleration solutions facilitating AI learning and inference. Nevertheless, limitations persist in executing intricate computations, such as those demanded by deep learning models for real-time object detection, primarily due to limited computing resources.

This study investigates potential optimization solutions for these constraints, specifically aiming to enhance the inference performance of real-time object detection models within Edge AI. Various techniques were deployed, including post-training model quantization compression, and quantization to FP16 data format was performed using the Tensorflow, TF-Lite, and TensorRT frameworks. To optimize Edge AI's memory and computing resources, quantization to a lower bit rate than FP16 was executed via the TensorRT INT8 data format. To evaluate the optimization outcomes, the performance of the optimized models was gauged against metrics encompassing GPU usage, Frames Per Second (FPS), Latency, Accuracy, and Size.

The assessment results indicated that the Tensorflow model, the TF-Lite model, and the TF-Lite FP16 model were insufficient as optimization options for real-time object detection models due to high latency and low FPS. In contrast, the TensorRT FP16 model showcased enhanced performance across all metrics. It exhibited approximately 1.58 times increased GPU usage, 2.04 times increased FPS, and 1.86 times decreased latency. In addition, accuracy improved for all classes, and the model size was reduced by approximately 1.9 times, thus corroborating the overall performance enhancement. The TensorRT INT8 model exhibited substantial improvement in three evaluation metrics, showcasing approximately 1.98 times increased GPU usage, 2.59 times increased FPS, and 2.33 times decreased latency. However, it failed to perform inference for some classes during object detection, leading to a substantial reduction in accuracy. Upon further analysis, the TF-Lite model and the TF-Lite FP16 model were identified as potential assets for embedded devices that support TF-Lite's GPU Delegate. The TensorRT INT8 quantization method demonstrated satisfactory performance for some accuracies. The findings suggest that with a high accuracy and reliable model, or when data distortion is controlled through specific layer input data and number distributions, the TensorRT INT8 model could serve as
an effective optimization tool.

In conclusion, the most effective approach to model optimization, aimed at enhancing the inference performance of real-time object detection models within Edge AI, is the FP16 quantization method utilizing the TensorRT engine. With further improvements, the TensorRT INT8 quantization method, which employs a lower bit rate than FP16 for optimal memory and computing resource utilization, could potentially serve as an instrumental strategy for enhancing Edge AI's inference performance.

주제어

학위논문 정보

저자	안성열
학위수여기관	전남대학교
학위구분	국내석사
학과	디지털미래융합서비스협동과정
지도교수	이상준
발행연도	2023
총페이지	64
키워드	Edge Computing Edge AI Computer Vision Object Detection Quantization Model Optimitzation
언어	kor
원문 URL	http://www.riss.kr/link?id=T16834371&outLink=K
정보원	한국교육학술정보원

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명(한글), 저자명(한글), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문) 관리번호, 논문명(한글), 논문명(영문), 저자명(한글), 저자명(영문), 학위수여기관, 학위연도, 학위구분, 학과, 총페이지, 키워드, 초록(한글), 초록(영문)
저장형식	Text(ASCII format) Excel format
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

Edge AI의 실시간 객체탐지 모델의 추론성능 향상을 위한 모델 최적화에 관한 연구
A Study on the Model Optimization for Improving Inference Performance of Real-time Object Detection Models in Edge AI 원문보기

초록 ▼
AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

Edge AI의 실시간 객체탐지 모델의 추론성능 향상을 위한 모델 최적화에 관한 연구 A Study on the Model Optimization for Improving Inference Performance of Real-time Object Detection Models in Edge AI 원문보기

초록 ▼ 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

학위논문 정보

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

Edge AI의 실시간 객체탐지 모델의 추론성능 향상을 위한 모델 최적화에 관한 연구
A Study on the Model Optimization for Improving Inference Performance of Real-time Object Detection Models in Edge AI 원문보기

초록 ▼
AI-Helper