[논문]가중치 VAE 오버샘플링(W-VAE)을 이용한 보안데이터셋 샘플링 기법 연구

강한바다; 이재우

doi:10.6109/jkiice.2022.26.12.1872

가중치 VAE 오버샘플링(W-VAE)을 이용한 보안데이터셋 샘플링 기법 연구
A Data Sampling Technique for Secure Dataset Using Weight VAE Oversampling(W-VAE) 원문보기

한국정보통신학회논문지 = Journal of the Korea Institute of Information and Communication Engineering, v.26 no.12, 2022년, pp.1872 - 1879

강한바다 (Department of Convergence Security, Chung-Ang University) , 이재우 (Department of Industrial Security, Chung-Ang University)

초록
AI-Helper

최근 인공지능 기술이 발전하면서 해킹 공격을 탐지하기 위해 인공지능을 이용하려는 연구가 활발히 진행되고 있다. 하지만, 인공지능 모델 개발에 핵심인 학습데이터를 구성하는데 있어서 보안데이터가 대표적인 불균형 데이터라는 점이 큰 장애물로 인식되고 있다. 이에 본 눈문에서는 오버샘플링을 위한 데이터 추출에 딥러닝 생성 모델인 VAE를 적용하고 K-NN을 이용한 가중치 계산을 통해 클래스별 오버샘플링 개수를 설정하여 샘플링을 하는 W-VAE 오버샘플링 기법을 제안한다. 본 논문에서는 공개 네트워크 보안 데이터셋인 NSL-KDD를 통해 ROS, SMOTE, ADASYN 등 총 5가지 오버샘플링 기법을 적용하였으며 본 논문에서 제안한 오버샘플링 기법이 F1-Score 평가지표를 통해 기존 오버샘플링 기법과 비교하여 가장 효과적인 샘플링 기법임을 증명하였다.

Abstract ▼ AI-Helper

Recently, with the development of artificial intelligence technology, research to use artificial intelligence to detect hacking attacks is being actively conducted. However, the fact that security data is a representative imbalanced data is recognized as a major obstacle in composing the learning data, which is the key to the development of artificial intelligence models. Therefore, in this paper, we propose a W-VAE oversampling technique that applies VAE, a deep learning generation model, to data extraction for oversampling, and sets the number of oversampling for each class through weight calculation using K-NN for sampling. In this paper, a total of five oversampling techniques such as ROS, SMOTE, and ADASYN were applied through NSL-KDD, an open network security dataset. The oversampling method proposed in this paper proved to be the most effective sampling method compared to the existing oversampling method through the F1-Score evaluation index.

주제어

참고문헌 (19)

S. H Seo, Y. J. Jeon, J. S. Lee, H. J. Jung, and J. T. Kim, "An Over-sampling Method based on Generative Adversarial Networks for Effective Classification of Imbalanced Big Data," in Proceedings of Korea Software Congress 2017, Busan, Korea, pp. 1030-1032, 2017.
M. J. Son, S. W. Jung, and E. J. Hwang, "A Deep Learning Based Over-Sampling Scheme for Imbalanced Data Classification," KIPS Transactions on Software and Data Engineering, vol. 8, no. 7, pp. 311-136, Jul. 2019.

원문보기 상세보기
J. H. Yang, "Comparison of the Classification Algorithms Using a Sampling Technique in Imbalanced Data," M. S. thesis, Dongguk University, Korea, 2017.
I. O. Jung, J. W. Ji, G. H. Lee, and M. J. Kim, "A study on intrusion detection performance improvement through imbalanced data processing," Jouranl of Information and Security, vol. 21, no. 3, pp. 57-66, Sep. 2021.
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, Jun. 2002.

상세보기
H. He, Y. Bai, E. A. Garcia, and S. Li, "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning," in Proceedings of IEEE International Joint Conference on Neural Networks, Hong Kong, pp.1322-1328, 2008.
K. Lee, "Oversampling based on Gaussian Mixture Model for Imbalanced data classification," M. S. thesis, Hanyang University, Korea, 2019.
Y. H. Choe and K. W. Oh, "A Study on the Introduction of CTGAN Oversampling Algorithm to improve Imbalance Problem in Intrusion Detection Data," The Journal of Korean Institute of Communications and Information Sciences, vol. 45, no. 12, pp. 2114-2122, Dec. 2020.

상세보기
S. T. Yoo and K. S. Kim., "Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling," Journal of the Korea Institute of Information Security & Cryptology, vol. 32, no. 2, pp. 201-211, Apr. 2022.
D. P. Kingma and M. Welling, "Auto-Encoding Variational Bayes," arXiv:1312.6114v10, 2013.
J. H. Park, "Improving Fashion Style Classification Accuracy using VAE in Class Imbalance Problem," The Journal of Korean Institute of Information Technology, vol. 19, no. 2, pp. 1-10, Feb. 2021.

상세보기
K. Sohn, H. Lee, and X. Yan, "Learning Structured Output Representation using Deep Conditional Generative Models," in Proceedings of Advances in neural information processing systems (NeurIPS), Montreal: QC, Canada, pp. 3483-3491, 2015.
F. Ulger, S. E. Yuksel, and A. Yilmaz, "Anomaly Detection for Solder Joints Using β-VAE," IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 11, no. 12, pp. 2214-2221, Oct. 2021.

상세보기
H. Tingfei, C. Guangquan, and H. Kuihua, "Using Variational Auto Encoding in Credit Card Fraud Detection," IEEE Access, vol. 8, pp. 149841-149853, Aug. 2020.

상세보기
S. C. Hsiao, D. Y. Kao, Z. Y. Liu, and R. Tso, "Malware Image Classification Using One-Shot Learning with Siamese Networks," in Procedia Computer Science, Budapest, Hungary, vol. 159, pp. 1863-1871, 2019.

상세보기
University of new brunswick, NSK-KDD dataset [Online]. Available: https://www.unb.ca/cic/datasets/nsl.html.
P. Devan and N. Khare, "An efficient XGBoost-DNN-based classification model for network intrusion detection system," Neural Computing and Applications, vol. 32, pp. 12499-12514, Jan. 2020.

상세보기
C. Yin, Y. Zhu, J. Fei and X. He, "A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks," IEEE Access, vol. 5, pp. 21954-21961, Oct. 2017.

상세보기
K. J. Ryu, "Study for Solving Network Traffic Data Imbalance And Rare Class Problems Using a Similarity Neural Network," M. S. thesis, Sejong University, Korea, 2021.

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증

가중치 VAE 오버샘플링(W-VAE)을 이용한 보안데이터셋 샘플링 기법 연구
A Data Sampling Technique for Secure Dataset Using Weight VAE Oversampling(W-VAE) 원문보기

초록
AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (19)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

연합인증

가중치 VAE 오버샘플링(W-VAE)을 이용한 보안데이터셋 샘플링 기법 연구 A Data Sampling Technique for Secure Dataset Using Weight VAE Oversampling(W-VAE) 원문보기

초록 용어보기논문에서 용어와 풀이말을 자동 추출한 결과로, 시범 서비스 중입니다. AI-Helper

Abstract ▼ AI-Helper

주제어

참고문헌 (19)

이 논문을 인용한 문헌

관련 콘텐츠

원문 보기

원문 URL 링크

오픈액세스(OA) 유형

연관된 기능

이 논문과 함께 이용한 콘텐츠

AI-Helper ※ AI-Helper는 오픈소스 모델을 사용합니다.

선택된 텍스트

가중치 VAE 오버샘플링(W-VAE)을 이용한 보안데이터셋 샘플링 기법 연구
A Data Sampling Technique for Secure Dataset Using Weight VAE Oversampling(W-VAE) 원문보기

초록
AI-Helper