[논문]다양한 데이터 전처리 기법 기반 침입탐지 시스템의 이상탐지 정확도 비교 연구

박경선; 김강석

doi:10.3745/ktsde.2021.10.11.449

다양한 데이터 전처리 기법 기반 침입탐지 시스템의 이상탐지 정확도 비교 연구
Comparative Study of Anomaly Detection Accuracy of Intrusion Detection Systems Based on Various Data Preprocessing Techniques 원문보기

정보처리학회논문지. KIPS transactions on software and data engineering. 소프트웨어 및 데이터 공학, v.10 no.11, 2021년, pp.449 - 456

박경선 (아주대학교 지식정보공학과) , 김강석 (아주대학교 사이버보안학과)

초록
AI-Helper

침입 탐지 시스템(IDS: Intrusion Detection System)은 보안을 침해하는 이상 행위를 탐지하는 기술로서 비정상적인 조작을 탐지하고 시스템 공격을 방지한다. 기존의 침입탐지 시스템은 트래픽 패턴을 통계 기반으로 분석하여 설계하였다. 그러나 급속도로 성장하는 기술에 의해 현대의 시스템은 다양한 트래픽을 생성하기 때문에 기존의 방법은 한계점이 명확해졌다. 이런 한계점을 극복하기 위해 다양한 기계학습 기법을 적용한 침입탐지 방법의 연구가 활발히 진행되고 있다. 본 논문에서는 다양한 네트워크 환경의 트래픽을 시뮬레이션 장비에서 생성한 NGIDS-DS(Next Generation IDS Dataset)를 이용하여 이상(Anomaly) 탐지 정확도를 높일 수 있는 데이터 전처리 기법에 관한 비교 연구를 진행하였다. 데이터 전처리로 패딩(Padding)과 슬라이딩 윈도우(Sliding Window)를 사용하였고, 정상 데이터 비율과 이상 데이터 비율의 불균형 문제를 해결하기 위해 AAE(Adversarial Auto-Encoder)를 적용한 오버샘플링 기법 등을 적용하였다. 또한, 전처리된 시퀀스 데이터의 특징벡터를 추출할 수 있는 Word2Vec 기법 중 Skip-gram을 이용하여 탐지 정확도의 성능 향상을 확인하였다. 비교실험을 위한 모델로는 PCA-SVM과 GRU를 사용하였고, 실험 결과는 슬라이딩 윈도우, Skip-gram, AAE, GRU를 적용하였을 때, 더 좋은 성능을 보였다.

Abstract ▼ AI-Helper

An intrusion detection system is a technology that detects abnormal behaviors that violate security, and detects abnormal operations and prevents system attacks. Existing intrusion detection systems have been designed using statistical analysis or anomaly detection techniques for traffic patterns, but modern systems generate a variety of traffic different from existing systems due to rapidly growing technologies, so the existing methods have limitations. In order to overcome this limitation, study on intrusion detection methods applying various machine learning techniques is being actively conducted. In this study, a comparative study was conducted on data preprocessing techniques that can improve the accuracy of anomaly detection using NGIDS-DS (Next Generation IDS Database) generated by simulation equipment for traffic in various network environments. Padding and sliding window were used as data preprocessing, and an oversampling technique with Adversarial Auto-Encoder (AAE) was applied to solve the problem of imbalance between the normal data rate and the abnormal data rate. In addition, the performance improvement of detection accuracy was confirmed by using Skip-gram among the Word2Vec techniques that can extract feature vectors of preprocessed sequence data. PCA-SVM and GRU were used as models for comparative experiments, and the experimental results showed better performance when sliding window, skip-gram, AAE, and GRU were applied.

주제어

표/그림 (19)

그림 Fig. 1. Systematic Representation for the Proposed Anomaly Detection Methodology
그림 Fig. 2. Part of Host Log from NGIDS-DS
그림 Fig. 3. NGIDS-DS Sorted Chronologically
표 Table 1. Skip-gram Applied Results
그림 Fig. 4. Test for Determining Sliding Window Size
그림 Fig. 5. Sliding Window
그림 Fig. 6. Example of Determining Traffic Type
그림 Fig. 7. Percentage of Data in the NGIDS-DS
그림 Fig. 8. Data Augmentation With AAE
표 Table 2. Model Results by Window Size of Sliding Window
그림 Fig. 9. Structure of GRU Classification Model
표 Table 3. Experimental Environment
표 Table 4. PCA-SVM Results with Padding
그림 Fig. 10. GRU Classification Model Results with Padding
그림 Fig. 11. GRU Classification Model Results with Sliding Window
표 Table 5. GRU Model Results with Padding
표 Table 6. PCA-SVM Results with Sliding Window
표 Table 7. GRU Model Results with Sliding Window
그림 Fig. 12. GRU Classification Model Results

참고문헌 (19)

Y. Lee, "Design and analysis of multiple intrusion detection model," Journal of The Korea Institute of Electronic Communication Sciences, Vol.11, No.6, pp.619-626, 2016.

원문보기 상세보기
W. Haider, J. Hua, J. Slaya, B. P. Turnbull, and Y. Xieb, "Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling," Journal of Network and Computer Applications, Vol.87, No.1, pp.185-192, 2017. https://doi.org/10.1016/j.jnca.2017.03.018

상세보기
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of Artificial Inteligence Research(JAIR), Vol.16, No.1, pp.321-357, 2002.

상세보기
A. Makhzani, J. Shlens, N. Jaitly, L. Goodfellow, and B. Frey, "Adversarial autoencoders," International Conference on Learning Representations, San Juan, Puerto Rico, 2016, http://arxiv.org/abs/1511.05644
S. Kim and S. Park, "Multi-class classification of database workloads using PCA-SVM classifier," Journal of KIISE: Database, Vol.38, No.1, pp.1-8, 2011.
K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdabau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing EMNLP, Doha, Qatar, pp.1724-1734, 2014.
Y. Cheong, K. Park, H. Kim, J. Kim, and S. Hyun, "Machine learning based intrusion detection systems for class imbalanced datasets," Journal of the Korea Institute of Information Security and Cryptology, Vol.27, No.6, pp.1385-1395, 2017. https://doi.org/10.13089/JKIISC.2017.27.6.1385

원문보기 상세보기
M. Lee, "LSTM model based on session management for network intrusion detection," Journal of The Institute of Internet, Broadcasting and Communication, Vol.20, No.3, pp.1-7, 2020. https://doi.org/10.7236/JIIBC.2020.20.3.1

원문보기 상세보기
M. Shahriar and N. Haque, "G-IDS: Generative adversarial networks assisted intrusion detection system," IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC), pp.376-385, 2020. https://doi.org/10.1109/COMPSAC48688.2020.0-218
R. Corizzo, E. Zdravevski, M. Russell, A. Vagliano, and N. Japkowicz, "Feature extraction based on word embedding models for intrusion detection in network traffic," Journal of Surveillance, Security and Safety, Vol.1, pp.140-150, 2020. https://doi.org/10.20517/jsss.2020.15
B. Min, J. Ryu, D. Shin, and D. Shin, "Improved network intrusion detection model through hybrid feature selection and data balancing," KIPS Transactions on Software and Data Engineering, Vol.10, No.2, pp.65-72, 2021. https://doi.org/10.3745/KTSDE.2021.10.2.65

원문보기 상세보기
J. Lee and K. Park, "GAN-based imbalanced data intrusion detection system," Personal and Ubiquitous Computing, Vol. 25, pp.121-128, 2021. https://doi.org/10.1007/s00779-019-01332-y

상세보기
D. M. Reddy and N. V. S. Reddy, "Effects of padding on LSTMs and CNNs," arXiv:1903.07288v1, 2019. https://arxiv.org/pdf/1903.07288.pdf
D. Senthil and G. Suseendran, "Efficient time series data classification using sliding window technique based improved association rule mining with enhanced support vector machine," International Journal of Engineering and Technology(UAE), Vol.7, No.2, 2018. https://doi.org/10.14419/ijet.v7i2.33.13890
T. Mikolov, G. Corrado, K. Chen, and J. Dean, "Efficient estimation of word representations in vector space," International Conference on Learning Representations, AZ, USA, pp.1-12, 2013. http://arxiv.org/abs/1301.3781
M. A. Turk and A. P. Pentland, "Face recognition using eigenfaces," Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, pp.586-591, 1991. https://doi.org/10.1109/CVPR.1991.139758
C. Cortes and V. Vapnik, "Support-vector networks," Machine Learning, Vol.20, No.3, pp.273-297, 1995. https://dx.doi.org/10.1007%2FBF00994018

상세보기
S. Jo, H. Sung, and B. Ahn, "A comparative study on the performance of SVM and an artificial neural network in intrusion detection," Journal of Korea Academia-Industrial Cooperation Society, Vol.17, No.2, pp.703-711, 2016. https://doi.org/10.5762/KAIS.2016.17.2.703

원문보기 상세보기
G. Nicole and J. Alfred, "Are GRU cells more specific and LSTM cells more sensitive in motive classification of text?," Frontiers in Artificial Intelligence, Vol.3, 2020. https://doi.org/10.3389/frai.2020.00040

상세보기

저자의 다른 논문 :

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증