Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retai...
Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.
Phishing website has become a crucial concern in cyber security applications. It is performed by fraudulently deceiving users with the aim of obtaining their sensitive information such as bank account information, credit card, username, and password. The threat has led to huge losses to online retailers, e-business platform, financial institutions, and to name but a few. One way to build anti-phishing detection mechanism is to construct classification algorithm based on machine learning techniques. The objective of this paper is to compare different classifier ensemble approaches, i.e. random forest, rotation forest, gradient boosted machine, and extreme gradient boosting against single classifiers, i.e. decision tree, classification and regression tree, and credal decision tree in the case of website phishing. Area under ROC curve (AUC) is employed as a performance metric, whilst statistical tests are used as baseline indicator of significance evaluation among classifiers. The paper contributes the existing literature on making a benchmark of classifier ensembles for web phishing detection.
* AI 자동 식별 결과로 적합하지 않은 문장이 있을 수 있으니, 이용에 유의하시기 바랍니다.
문제 정의
This paper investigates the performance of classifier ensembles for automatic web phishing detection. Several ensemble learning approaches are included in the study such as random forest (RF) [7], rotation forest (RotFor) [8], gradient boosted machine (GBM) [9], and extreme gradient boosting (XGBoost) [10].
This paper provided a comparative study of classifier ensembles for phishing web detection. A number of ensembles algorithms and single classification algorithms were included in the experiment.
제안 방법
A novel web phishing website based on probabilistic neural network (PNN) has been presented in [23]. A k-medoids clustering approach is also incorporated in order to evaluate the complexity of the proposed method. Based on their experiment, an effective detection model with a reduced complexity can be built without sacrificing detection accuracy.
In order to acquire the same element at each resampling procedure, we are interested to investigate the following methods: 2 times repeated 10- fold (2×10f), 4 times repeated 5-fold (4×5f), 20 times repeated 2/3 hold-out (20×ho), and 20 times repeated boostrap (20×boot).
It is obvious that the top performer among ensemble algorithms is RF, whilst GBM have performed worse in phishing web detection. In order to provide an ample comparative study, the performance differences of all classifiers are subsequently benchmarked using statistical test. First of all, the result of Friedman test is shown in Table 1.
In this experiment, different resampling procedures are used such as k-fold cross validation, subsampling, and bootstrap. In the k-cross validation, make k disjoint partitions of approximately equal size.
2E-16) difference among classifiers regardless of resampling approaches used. Since Friedman test points out the significance of these results, it is worthwhile to conduct Nemenyi post-hoc test. The results of the post-hoc test at each resampling approach are visually represented with critical difference (CD) diagram as shown in Fig.
information gain and chi-square is described in [24]. The results obtained from applying the proposed method against full feature set, it has been revealed that the proposed method is able to pick relevant features that impact on the phishing detection rate.
성능/효과
Rule based phishing detection is proposed by [20]. The experiment reveals that the error-rate has decreased for all the algorithms, CBA classifier algorithm has the lowest error-rate with 4.75%. A performance comparison of machine learning algorithms for web phishing detection has been conducted by [21].
Their detection performance were evaluated using AUC value with respect to different resampling approaches. The experimental results revealed that random forest was superior to other ensembles, i.e. xgboost, rotation forest and GBM and to single classifiers, i.e. C50, C-DT, and CART. Further study should include other web phishing data set in order to provide a more comprehensive benchmark.
후속연구
C50, C-DT, and CART. Further study should include other web phishing data set in order to provide a more comprehensive benchmark.
참고문헌 (27)
A.P.W. Group, White Paper: Phishing Response Trends, Technical Report, 2017.
S.C. Jeeva and E.B. Rajsingh, "Intelligent Phishing URL Detection Using Association Rule Mining," Human-Centric Computing and Information Sciences, Vol. 6, No. 1, pp. 1-19, 2016.
B.A. Tama and K.H. Rhee, "Performance Analysis of Multiple Classifier System in DoS Attack Detection," Proceeding of International Workshop on Information Security Applications, pp. 339-347, 2015.
K.S. Komariah, C. Machbub, A.S. Prihatmanto, and B.-K. Shin, "A Study on Efficient Market Hyphothesis to Predict Exchange Rate Trends Using Sentiment Analysis of Twitter Data," Journal of Korea Multimedia Society, Vol. 19, No. 7, pp. 1107-1115, 2016.
L. Breiman, "Random Forests," Machine Learning, Vol. 45, No. 1, pp. 5-32, 2001.
J.J. Rodriguez, L.I. Kuncheva, and C.J. Alonso, "Rotation Forest: A New Classifier Ensemble Method," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 10, pp. 1619-1630, 2006.
T. Chen and C. Guestrin, "XGboost: A Scalable Tree Boosting System," Proceeding of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.
J.R. Quinlan, C4.5: Programs for Machine Learning, Calif : Morgan Kaufmann Publishers, San Mateo, 2014.
W.Y. Loh, "Classification and Regression Trees," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1, No. 1, pp. 14-23, 2011.
C.J. Mantas and J. Abellan, "Credal-C4.5: Decision Tree Based on Imprecise Probabilities to Classify Noisy Data," Expert Systems with Applications, Vol. 41, No. 10, pp. 4625-4637, 2014.
R.B. Basnet, S. Mukkamala, and A.H. Sung, "Detection of Phishing Attacks: A Machine Learning Approach," Soft Computing Applications in Industry, Vol. 226, pp. 373-383, 2008.
M. Aburrous, M.A. Hossain, K. Dahal, and F. Thabtah, "Intelligent Phishing Detection System for E-Banking Using Fuzzy Data Mining," Expert Systems with Applications, Vol. 37, No. 12, pp. 7913-7921, 2010.
F. Thabtah, R.M. Mohammad, and L. Mc Cluskey, "A Dynamic Self-Structuring Neural Network Model to Combat Phishing," Proceeding of Neural Networks 2016 International Joint Conference on IEEE, pp. 4221-4226, 2016.
R.M. Mohammad, F. Thabtah, and L. Mc Cluskey, "Predicting Phishing Websites Based on Self-Structuring Neural Network," Neural Computing and Applications, Vol. 25, No. 2, pp. 443-458, 2014.
M. Dadkhah, M. Dadkhah, S. Shamshirband, S. Shamshirband, and A.W.A. Wahab "A Hybrid Approach for Phishing Web Site Detection," The Electronic Library, Vol. 34, No. 6, pp. 927-944, 2016.
R.M. Mohammad, F. Thabtah, and L. Mc Cluskey, "Intelligent Rule-Based Phishing Websites Classification," IET Information Security, Vol. 8, No. 3, pp. 153-160, 2014.
A. Hodzic, J. Kevric, and A. Karadag, "Com-Parison of Machine Learning Techniques in Phishing Website Classification," Proceeding of International Conference on Economic and Social Sciences, pp. 249-256, 2016.
F. Thabtah and N. Abdelhamid, "Deriving Correlated Sets of Website Features for Phishing Detection: A Computational Intelligence Approach," Journal of Information and Knowledge Management, Vol. 15, No. 04, pp. 1-17, 2016.
E.S.M. El-Alfy, "Detection of Phishing Websites Based on Probabilistic Neural Networks and K-Medoids Clustering," The Computer Journal, Vol. 60, No. 12, pp. 1-5, 2017.
K.D. Rajab, "New Hybrid Features Selection Method: A Case Study on Websites Phishing," Security and Communication Networks, Vol. 2017, pp. 1-10, 2017.
R. Quinlan, Data Mining Tools See5 and C5.0, 2004. http://www.rulequest.com/see5-info.html (accessed Jan., 8, 2018)
J. Abellan and S. Moral, "Building Classification Trees Using the Total Uncertainty Criterion," International Journal of Intelligent Systems, Vol. 18, No. 12, pp. 1215-1225, 2003.
※ AI-Helper는 부적절한 답변을 할 수 있습니다.