[논문]머신러닝을 활용한 행위 및 스크립트 유사도 기반 크립토재킹 탐지 프레임워크

임은지; 이은영; 이일구

doi:10.13089/jkiisc.2021.31.6.1105

[국내논문] 머신러닝을 활용한 행위 및 스크립트 유사도 기반 크립토재킹 탐지 프레임워크
Behavior and Script Similarity-Based Cryptojacking Detection Framework Using Machine Learning 원문보기

情報保護學會論文誌 = Journal of the Korea Institute of Information Security and Cryptology, v.31 no.6, 2021년, pp.1105 - 1114

임은지 (연세대학교) , 이은영 (성신여자대학교) , 이일구 (성신여자대학교)

초록
AI-Helper

최근 급상승한 암호 화폐의 인기로 인해 암호 화폐 채굴 악성코드인 크립토재킹 위협이 증가하고 있다. 특히 웹 기반 크립토재킹은 피해자가 웹 사이트에 접속만 하여도 피해자의 PC 자원을 사용해 암호 화폐를 채굴할 수 있으며 간단하게 채굴 스크립트만 추가하면 되기 때문에 공격이 쉽고 성능 열화와 고장의 원인이 된다. 크립토재킹은 피해자가 피해 상황을 인지하기 어렵기 때문에 크립토재킹을 효율적으로 탐지하고 차단할 수 있는 연구가 필요하다. 본 연구에서는 크립토재킹의 대표적인 감염 증상과 스크립트를 지표로 활용하여 효과적으로 크립토재킹을 탐지하는 프레임워크를 제안하고 평가한다. 제안한 크립토재킹 탐지 프레임워크에서 행위 기반 동적 분석 기법으로 컴퓨터 성능 지표를 학습한 K-Nearest Neighbors(KNN) 모델을 활용했고, 스크립트 유사도 기반 정적 분석 기법은 악성 스크립트 단어 빈도수를 학습한 K-means 모델을 크립토재킹 탐지에 활용했다. 실험 결과에 따르면 KNN 모델은 99.6%의 정확도를 보였고, K-means 모델은 정상 군집의 실루엣 계수가 0.61인 것을 확인하였다.

Abstract ▼ AI-Helper

Due to the recent surge in popularity of cryptocurrency, the threat of cryptojacking, a malicious code for mining cryptocurrencies, is increasing. In particular, web-based cryptojacking is easy to attack because the victim can mine cryptocurrencies using the victim's PC resources just by accessing the website and simply adding mining scripts. The cryptojacking attack causes poor performance and malfunction. It can also cause hardware failure due to overheating and aging caused by mining. Cryptojacking is difficult for victims to recognize the damage, so research is needed to efficiently detect and block cryptojacking. In this work, we take representative distinct symptoms of cryptojacking as an indicator and propose a new architecture. We utilized the K-Nearst Neighbors(KNN) model, which trained computer performance indicators as behavior-based dynamic analysis techniques. In addition, a K-means model, which trained the frequency of malicious script words for script similarity-based static analysis techniques, was utilized. The KNN model had 99.6% accuracy, and the K-means model had a silhouette coefficient of 0.61 for normal clusters.

주제어

표/그림 (6)

그림 Fig. 1. Machine Learning-based Cryptojacking Detection Framework
표 Table 1. API code for mining
그림 Fig. 2. Correlation Heatmap Representing Associations Between Columns
그림 Fig. 3. Categorized Data Using Artificial Intelligence
그림 Fig. 4. Clustered Data Using K-means Algorithm
그림 Fig. 5. KNN Cryptojacking Classifier Accuracy

AI 본문요약
AI-Helper

문제 정의

본 논문에서는 머신러닝 기반 크립토재킹 탐지 프레임워크를 제안하였다. 크립토재킹 피해는 지속적으 로 증가하고 있으며, 크립토재킹 공격을 당하더라도 크립토재킹의 특성상 피해자가 피해를 입은 사실조차 알기 어렵다.

제안 방법

동적 탐지는 공격 상황에서 발생하는 크립토재킹의 행위 특징을 분석하여 악성 행위를 탐지한다.
본 연구 논문에서는 효율적인 크립토재킹 동적 탐지를 위해 머신러닝 기반의 지도학습 중 분류 방법을 적용하여 데이터의 감염 여부를 분류한다. 지도학습은 정답이 있는 데이터를 학습하며, 연구에서는 분류(Classification)의 방법을 설정하였다.
이후, 프레임워크의 머신러닝 기반 동적 탐지와 정적 탐지 모델을 구현하고 검증하였다

대상 데이터

동적 분석의 경우 수집한 데이터를 검토하여, 그 중 악성코드 노출시 명확한 변화가 보이는 데이터들을 추출한다. 본 연구에서는 특히 크립토재킹 감염 후 수치가 높아지는 코어 온도와 CPU 이용률을 학습 데이터로 선정하였다. 자세한 내용은 추후 설명한다.
앞서 수집한 자바스크립트 데이터셋을 활용하여 890개의 정상 스크립트, 890개의 악성 스크립트를 생성하였다. 여기서 활용한 데이터셋은 2018년 7월 12일에 알렉사 탑 백만 웹사이트로부터 수집되었다. 크립토재킹 URL 목록, 스크립트 목록, HTML 파일, 스크립트 파일 등으로 구성되어있으며, 이 중에서 마이닝 스크립트를 학습하였다.
여기서 활용한 데이터셋은 2018년 7월 12일에 알렉사 탑 백만 웹사이트로부터 수집되었다. 크립토재킹 URL 목록, 스크립트 목록, HTML 파일, 스크립트 파일 등으로 구성되어있으며, 이 중에서 마이닝 스크립트를 학습하였다.

데이터처리

또한, Stratified K-Fold를 이용한 교차 검증을 시도하였다. 해당 방법은 원본 데이터상에서 레이블 의 비율을 고려하여 train과 test set 또한 동일한 비율로 학습과 검증하는 것을 돕는다.

이론/모형

Dynamic Analysis에서는 K-Nearest Neighbors(KNN) 알고리즘을 사용해 코어 온도, CPU 이용률을 학습한다.
앞서 언급하였듯, 동적 분석에서는 KNN 알고리즘을 적용하여 평가했다. “Malware”열은 라벨로 지 정하였고 나머지 열은 피쳐로 지정해주었다, 사이킷런의 cross_validate 함수를 통해서 모델 평가 시 에 여러 지표를 활용할 수 있도록 하였다.
정적 분석의 클러스터링 결과를 평가하기 위해 실루엣 계수를 활용하였다. 실루엣 계수는 클러스터링이 잘 이루어졌는지 확인할 수 있는 평가 지표이다.

성능/효과

이후, 프레임워크의 머신러닝 기반 동적 탐지와 정적 탐지 모델을 구현하고 검증하였다. KNN 알고리즘을 사용한 동적 탐지 모델은 정확도가 99.6%, K-means 알고리즘을 사용한 정적 탐지 모델은 정상 군집의 실루엣 계수가 0.61로 크립토재킹을 탐지할 수 있음을 확인하였다.
기존에는 크립토재킹에 대응하기 위해서 피해자가 일일이 CPU 사용량을 확인해 감염 증상이 있는지 체크하거나, 알려진 정보를 바탕으르 차단하여 한계가 존재한다. 본 논문에서 제안하는 머신 러닝 기반 크립토재킹 탐지 프레임워크는 데이터 수집, 블랙리스트 필터링, 정적 탐지 모델, 동적 탐지 모델 총 네 단계를 거쳐 효과적으로 크립토재킹을 탐지할 수 있다. 결과적으로 해당 단계들을 거쳐 크립토재킹을 빠르고 효율적으로 탐지하는 프로그램의 구현도 가능할 것이다.

후속연구

본 논문에서 제안하는 머신 러닝 기반 크립토재킹 탐지 프레임워크는 데이터 수집, 블랙리스트 필터링, 정적 탐지 모델, 동적 탐지 모델 총 네 단계를 거쳐 효과적으로 크립토재킹을 탐지할 수 있다. 결과적으로 해당 단계들을 거쳐 크립토재킹을 빠르고 효율적으로 탐지하는 프로그램의 구현도 가능할 것이다. 이후, 프레임워크의 머신러닝 기반 동적 탐지와 정적 탐지 모델을 구현하고 검증하였다.
크립토재킹 피해는 지속적으 로 증가하고 있으며, 크립토재킹 공격을 당하더라도 크립토재킹의 특성상 피해자가 피해를 입은 사실조차 알기 어렵다. 기존에는 크립토재킹에 대응하기 위해서 피해자가 일일이 CPU 사용량을 확인해 감염 증상이 있는지 체크하거나, 알려진 정보를 바탕으르 차단하여 한계가 존재한다. 본 논문에서 제안하는 머신 러닝 기반 크립토재킹 탐지 프레임워크는 데이터 수집, 블랙리스트 필터링, 정적 탐지 모델, 동적 탐지 모델 총 네 단계를 거쳐 효과적으로 크립토재킹을 탐지할 수 있다.
이 단계에서 기존에 알려진 URL과 스크립트만을 사용하면, 추후 새롭게 탐지되는 생기는 URL 및 스크립트에 한계가 발생할 수 있다. 본 연구에서는 한계점을 보완하기 위하여 머신러닝 기반의 크립토재킹 탐지 모델과 학습 모듈을 사용하여 확장성 및 일반성을 확대할 수 있다.
정적 분석의 경우에는 전형적인 크립토재킹 공격만 고려했고, 동적 분석의 경우 일반 프로그램에서도 온도나 CPU 사용량 등이 높아지는 경우가 있다. 향후 연구의 한계점을 보완하기 위해 비교군을 넓혀 크립토재킹의 특징적인 지표를 찾고, 지표의 주기성을 파악하여 탐지의 정확도를 높일 수 있는 연구를 진행하고자 한다.

참고문헌 (24)

T. He, R.M. Aronce, L. Dampanaboina, J. Jose, M. King and E.C. Cohen, "2021 SonicWall Cyber Threat Report," Sonicwall, 2021.
R. Julian, S. Sebastian, D. Tobias, L. Rober, B. Damjan , P. Gerhar and K. Hyoungshick , "The Other Side of the Coin: A Framework for Detecting and Analyzing Web-based Cryptocurrency Mining Campaigns," In International Conference on Availability, Reliability and Security, no. 18, pp 01-10, Aug. 2018
Tanana and Dmitry, "Behavior-based detection of cryptojacking malware" 2020 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT). pp. 0543-0545, May. 2020
Wenhao Wang, Benjamin Ferrell, Xiaoyang Xu, W. Kevin, Hamlen and Shuang Hao, "SEISMIC: SEcure in-lined script monitors for interrupting cryptojacks", In European Symposium on Research in Computer Security, pp. 122-142, Sep. 2018
Geng Hong, Zhemin Yang, Sen Yang, Lei Zhang, Yuhong Nan, Zhibo Zhang, Min Yang, Yuan Zhang, Zhiyun Qian and Haixin Duan, "How you get shot in the back: A systematical study about cryptojacking in the real world," In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp 1701-1713, Oct. 2018
Petrov, Ivan, Luca Invernizzi and Elie Bursztein, "Coinpolice: Detecting hidden cryptojacking attacks with neural networks," arXiv preprint arXiv:2006.10861, June. 2020
antiminer, "AntiMiner", https://github.com/unkn0wn404/MinerBlocker, accessed Jul.13,2021, 2017
minerblock, "MinerBlock", https://github.com/xd4rker/MinerBlock], accessed Jul.13,2021, 2019
nocoin, "NoCoin", https://github.com/keraf/NoCoin/blob/master/src/blacklist.txt, accessed Jul.13,2021, 2018
Radhesh Krishnan Konoth, Emanuele Vineti, Veelasha Moonsamy, Martina Lindorfer, Christopher Kruegel, Herbert Bos and Giovanni Vigna, 2018, "Minesweeper: An in-depth look into drive-by cryptocurrency mining and its defense," In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pp. 1714-1730, Oct. 2018
Muhammad Saad, Aminollah Khormali and Aziz Mohaisen, "End-to-end analysis of in-browser cryptojacking," In arXiv preprint arXiv:1809.02152, Sep. 2018
Binance Academy, "CryptoJacking Des cription", https://academy.binance.com/ko/articles/what-is-cryptojacking, accessed Jul.13,2021
Daily Today, "Cryptojacking to enslave your PC", http://www.digitaltoday.co.kr/news/articleView.html?idxno202302, accessed Jul.13,2021
KrebsonSecurity, "Who and What Is Coinhive?,"https://krebsonsecurity.com/2018/03/who-and-what-is-coinhive/, accessed Jul.13,2021
The Irish Times, "Q&A: What is the story with Coinhive?", https://www.irishtimes.com/business/technology/q-awhat-is-the-story-with-coinhive-1.3389706, accessed Jul.13,2021
Pandasecurity, "Coinhive, the Monero mining service, is closing down", https://www.pandasecurity.com/en/mediacenter/news/coinhive-mining-closes/, accessed Jul.13,2021
ZDNet, "Coinhive cryptojacking service to shut down in March 2019", https://www.tripwire.com/state-of-security/security-data-protection/cyber-security/coinhive-browser-cryptominingservice-dead/, accessed Jul.13,2021, 2019
Malwarebytes, "Cryptojacking in the post-Coinhive era," https://blog.malwarebytes.com/cybercrime/2019/05/cryptojacking-in-the-post-coinhive-era/, accessed Jul.13,2021, 2019
PublicWWW, "PublicWWW", https://publicwww.com/, accessed Jul.13,2021
Hugo L.J. Bijmans, Tim M. Booij, and Christian Doerr, "Inadvertently Making Cyber Criminals Rich:A Comprehensive Study of Cryptojacking Campaigns at Internet Scale", 28th USENIX Security Symposium, pp.1627-1644, Aug. 2019
Said Varlioglu, Bilal Gonen, Murat Ozer, Mehmet F. Bastug, "Is Cryptojacking Dead after Coinhive Shutdown?", 2020 3rd International Conference on Information and Computer Technologies (ICICT), pp.385-389, Mar. 2020
Forsenergy, "Windows Performance Mo nitor Overview", https://forsenergy.com/ko-kr/perfmon/html/44daefa4-407d-4763-b42f-b613a261da54.htm, accessed Jul.13,2021
SRILAB, "150k Javascript Dataset", https://www.sri.inf.ethz.ch/js150, accessed Jul.13,2021
J. Burgess (Creator), "CryptoJacking Data (including raw HTML/JS files)," Queen's University Belfast, CryptoJacking_AlexaTop1m_July2018(.zip), 10.17034/ea782cda-b3ac-4fc3-b78b-c81324453280, accessed Jul.13,2021, Feb 2020

저자의 다른 논문 :

표제어: PCR

동의어: Packet Collision Rate

용어 설명 출처 목록 (6)

용어 설명: PCR은 세균 특이성이 있는 primer를 이용하여 적은 수의 세균이 있을지라도 쉽게 검출할 수 있는 유용한 방법이며, 이를 이용하여 구강 내 치면세균막이나 타액에서 직접 세균을 검출할 수 있게 되었다[8].

내보내기 구분	파일저장 인쇄 메일전송
구성항목	기본정보 상세정보 관리번호, 논문명, 저널/프로시딩명, 저자 , 발행년, 권, 호, 시작페이지, 끝페이지, 발행기관 관리번호, 논문명, 대등논문명, 저자 , 저널/프로시딩명, 발행기관, 발행년, 발행언어, 권, 호, 시작페이지, 끝페이지, ISBN, ISSN, 주제분야, 키워드, 초록(한글), 초록(영문), 저자(소속기관)
저장형식	Text(ASCII format) Excel format RefWorks Direct Export RIS format (for Reference Manager, ProCite, EndNote), Scholar's Aids, Mendeley
메일정보	받는사람 (필수) @ 보내는사람 (선택) @ 제목 내용 KISTI 검색결과 이메일 서비스
안내	총 건의 자료가 검색되었습니다. 다운받으실 자료의 인덱스를 입력하세요. (1-10,000) 검색결과의 순서대로 최대 10,000건 까지 다운로드가 가능합니다. 데이타가 많을 경우 속도가 느려질 수 있습니다.(최대 2~3분 소요) 다운로드 파일은 UTF-8 형태로 저장됩니다. 파일의 내용이 제대로 보이지 않을실 때는 웹브라우저 상단의 보기 -> 인코딩 -> 자동선택 여부를 확인하십시오. ~ Text(ASCII format) Excel format

연합인증