[논문]데이터 예측 클래스 기반 적대적 공격 탐지 및 분류 모델

고은나래; 문종섭

doi:10.13089/jkiisc.2021.31.6.1227

데이터 예측 클래스 기반 적대적 공격 탐지 및 분류 모델
Adversarial Example Detection and Classification Model Based on the Class Predicted by Deep Learning Model 원문보기

情報保護學會論文誌 = Journal of the Korea Institute of Information Security and Cryptology, v.31 no.6, 2021년, pp.1227 - 1236

고은나래 (고려대학교) , 문종섭 (고려대학교)

초록
AI-Helper

딥러닝 분류 모델에 대한 공격 중 하나인 적대적 공격은 입력 데이터에 인간이 구별할 수 없는 섭동을 추가하여 딥러닝 분류 모델이 잘못 분류하도록 만드는 공격이며, 다양한 적대적 공격 알고리즘이 존재한다. 이에 따라 적대적 데이터를 탐지하는 연구는 많이 진행되었으나 적대적 데이터가 어떤 적대적 공격 알고리즘에 의해 생성되었는지 분류하는 연구는 매우 적게 진행되었다. 적대적 공격을 분류할 수 있다면, 공격 간의 차이를 분석하여 더욱 견고한 딥러닝 분류 모델을 구축할 수 있을 것이다. 본 논문에서는 공격 대상 딥러닝 모델이 예측하는 클래스를 기반으로 은닉층의 출력값에서 특징을 추출하고 추출된 특징을 입력으로 하는 랜덤 포레스트 분류 모델을 구축하여 적대적 공격을 탐지 및 분류하는 모델을 제안한다. 실험 결과 제안한 모델은 최신의 적대적 공격 탐지 및 분류 모델보다 정상 데이터의 경우 3.02%, 적대적 데이터의 경우 0.80% 높은 정확도를 보였으며, 기존 연구에서 분류하지 않았던 새로운 공격을 분류한다.

Abstract ▼ AI-Helper

Adversarial attack, one of the attacks on deep learning classification model, is attack that add indistinguishable perturbations to input data and cause deep learning classification model to misclassify the input data. There are various adversarial attack algorithms. Accordingly, many studies have been conducted to detect adversarial attack but few studies have been conducted to classify what adversarial attack algorithms to generate adversarial input. if adversarial attacks can be classified, more robust deep learning classification model can be established by analyzing differences between attacks. In this paper, we proposed a model that detects and classifies adversarial attacks by constructing a random forest classification model with input features extracted from a target deep learning model. In feature extraction, feature is extracted from a output value of hidden layer based on class predicted by the target deep learning model. Through Experiments the model proposed has shown 3.02% accuracy on clean data, 0.80% accuracy on adversarial data higher than the result of pre-existing studies and classify new adversarial attack that was not classified in pre-existing studies.

주제어

표/그림 (10)

그림 Fig. 1. example of adversarial attack
그림 Fig. 2. Overview of the Proposed model
표 Table 1. Pseudo code for extraction features
표 Table 2. Experiments environments
표 Table 3. white-box attack dataset for adversarial attack detection and classification
표 Table 4. dataset for adversarial attack detection and classification
그림 Fig. 3. Normalized confusion matrix for the detection result
그림 Fig 4. Normalized confusion matrix for the classification result
표 Table. 5. Evaluation of proposed Model for detection
표 Table. 6. Extracted Features for the Experiments

AI 본문요약
AI-Helper

문제 정의

그러나 적대적 공격을 구별하는 연구는 거의 진행되지 않았다. 따라서 본 논문에서는 분류 모델이 분류하는 클래스에 따라 데이터를 처리하여 적대적 데이터를 탐지 및 분류하는 문제를 해결하는 것을 목표로 한다. 본 논문의 의의는 다음과 같다.
본 논문에서는 적대적 공격 대상 딥러닝 모델이 예측하는 클래스에 따라 분류에 사용하는 데이터를 생성하여 적대적 공격을 탐지 및 분류하는 방안을 제시하였다. 실험을 통해 기존 연구보다 훨씬 더 적은 특징을 이용해 비슷한 분류 성능을 보였고, 새로운 공격에 대한 분류에 성공하였다.

제안 방법

1. 다양한 적대적 공격으로 생성된 적대적 데이터를 탐지 및 분류하였다.
본 논문에서 제안하는 적대적 공격 탐지 및 분류 모델은 특징 선택, 특징 추출, 적대적 공격 탐지 단계로 구성되며, Fig 2.와 같다.

대상 데이터

실험에는 CIFAR-10 데이터 셋을 사용하였으며, 60,000개의 정상 데이터와 공격 별 10,000개의 적대적 데이터를 활용하여 실험을 진행하였다. 실험에 사용된 적대적 데이터 공격의 종류는 총 7개이며, Table 3.과 같다.
실험에는 CIFAR-10 데이터 셋을 사용하였으며, 60,000개의 정상 데이터와 공격 별 10,000개의 적대적 데이터를 활용하여 실험을 진행하였다. 실험에 사용된 적대적 데이터 공격의 종류는 총 7개이며, Table 3.
정상 데이터와 생성된 적대적 데이터 총 130,000개를 70%, 30%로 나누어 70%는 모델을 학습시키는 데이터로, 30%는 모델의 성능을 평가하는 데 사용하였다. Table 4.

데이터처리

실험 결과는 적대적 데이터를 탐지하는 성능과 적대적 데이터를 분류하는 성능으로 나누어 결과를 제시한다.

이론/모형

세 번째, 분류기 학습 단계는 추출한 특징을 입력으로 하는 적대적 데이터 탐지 및 분류 모델을 구성하는 단계이다. 분류 모델의 경우 랜덤 포레스트(Random Forest) 모델을 활용한다.
적대적 데이터를 탐지 및 분류하기 위해 사용하는 랜덤 포레스트 분류 모델은 scikit-learn 라이브러리를 이용하여 구축하였고, 의사결정트리 개수는 100으로 설정하였다.

성능/효과

실험을 통해 기존 연구보다 훨씬 더 적은 특징을 이용해 비슷한 분류 성능을 보였고, 새로운 공격에 대한 분류에 성공하였다. 또한, 정상 데이터에 대한 정확도에 대해서는 3.02% 향상된 성능을 보였다.
본 논문에서는 적대적 공격 대상 딥러닝 모델이 예측하는 클래스에 따라 분류에 사용하는 데이터를 생성하여 적대적 공격을 탐지 및 분류하는 방안을 제시하였다. 실험을 통해 기존 연구보다 훨씬 더 적은 특징을 이용해 비슷한 분류 성능을 보였고, 새로운 공격에 대한 분류에 성공하였다. 또한, 정상 데이터에 대한 정확도에 대해서는 3.

후속연구

향후 연구를 통해서 다른 적대적 공격으로 생성한 데이터를 활용해 추가적인 적대적 공격 종류에 대한 분류 모델을 구축하는 연구가 가능하고, 제안한 특징 이외에 추가적인 특징을 추출하여 적대적 공격을 더욱 정밀하게 분류할 수 있는 모델을 구축할 수 있을 것이다.

참고문헌 (21)

H. Caesar, V. Bankiti and AH. Lang, "nuScenes: A multimodal dataset for autonomous driving," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11621-11631, Jun. 2020.
G. Litjens, T. Kooi and B.E. Bejnordi, "A survey on deep learning in medical image analysis," Medical Image Analysis, vol. 42, pp. 60-88, Jul. 2017.

상세보기
Ian J. Goodfellow, J. Shlens and C. Szegedy, "Explaining and Harnessing Adversarial Examples," arXiv preprint arXiv: 1412.6572v3, Mar. 2015.
A. Madry, A. Makelov and L. Schmidt, "Towards Deep Learning Models Resistant to Adversarial Attacks," arXiv preprint arXiv:1706.06083v4, Sep. 2019.
N. Papernot, P. McDaniel and S. Jha, "The Limitations of Deep Learning in Adversarial Settings," IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372-387, Mar. 2016.
S.M. Moosavi-Dezfooli, A. Fawzi and P. Frossard, "DeepFool: a simple and accurate method to fool deep neural networks," Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2574-2582, Jun. 2016.
N. Carlini and D. Wagner, "Towards Evaluating the Robustness of Neural Networks," IEEE Symposium on Security and Privacy. pp. 39-57, May. 2017.
F. Croce and M. Hein, "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks," Proceedings of the 37th International Conference on Machine Learning (PMLR), vol.119, pp. 2206-2216, Jul. 2020.
F. Tramer, A. Kurakin and N. Papernot, "Ensemble Adversarial Training: Attacks and Defenses," arXiv preprint arXiv: 1705.07204v5, Apr. 2020.
A. Shafahi, M. Najibi and A. Ghiasi, "Adversarial Training for Free!," Proceedings of the 33rd International Conference on Neural Information Processing Systems, pp. 3358-3369, Dec. 2019.
A. Prakash, N. Moran and S. Garber, "Deflecting adversarial attacks with pixel deflection," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8571-8580, Jun. 2018.
N. Papernot, P. McDaniel and X. Wu, "Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks," IEEE Symposium on Security and Privacy, pp. 582-597, May. 2016.
A. Aldahdooh, W. Hamidouche and S A. Fezza, "Adversarial Example Detection for DNN Models: A Review," arXiv preprint arXiv: 2105.00203v2, Sep. 2021.

상세보기
X. Li and F. Li, "Adversarial examples detection in deep networks with convolutional filter statistics," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 5764-5772, Oct. 2017.
HF. Eniser, M. Christakis and V. Wustholz, "RAID: Randomized adversarial-input detection for neural networks," arXiv preprint arXiv: 2002.02776v1, Feb. 2020.
S. Pertigkiozoglou and P. Maragos, "Detecting Adversarial Examples in Convolutional Neural Networks," arXiv preprint arXiv: 1812.03303v1, Dec. 2018.
J. Lu, T. Issaranon and D. Forsyth, "SafetyNet: Detecting and Rejecting Adversarial Examples Robustly," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 446-454, Oct. 2017.
F. Carrara, F. Falchi and R. Caldelli, "Detecting adversarial example attacks to deep neural networks," Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing, pp. 1-7, Jun. 2017.
W. Xu, D. Evans and Y. Qi, "Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks," arXiv preprint arXiv: 1704.01155v2, Dec. 2017.
N. Manohar-Alers, R. Feng and S. Singh, "Using Anomaly Feature Vectors for Detecting, Classifying and Warning of Outlier Adversarial Examples," arXiv preprint arXiv: 2107.00561v1, Jul. 2021.
S. Zagoruyko and N. Komodakis, "Wide Residual Networks," arXiv preprint arXiv: 1605.07146v4, Jun. 2017.

저자의 다른 논문 :

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증