Identification of Fraudsters Involved in Phishing by Different Machine Learning Models
Loading...
Files
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Electrical and Electronic Engineering(EEE), Islamic University of Technology(IUT),
Abstract
With digitization of the current age, number of fraudsters in the digital realm has increased
manifolds. Although the internet can be used for much good of the general population,
the increase in number of unscrupulous people in online is a grave danger to the general
public. Among many of the vices in the internet, one of the common one is phishing. To
tackle phishing many approaches has been taken, of them ML based approach is one of
the leading approaches. In our research work, we compared and contrasted many ML
models to find out which one is most suitable for phishing detection. Our research is
unique in regards that we have integrated data preprocessing and reduced the number of
features for complexity reduction. Among these models XGBoost brought the highest
accuracy after the hyperparameter tuning which was 97.0455%.
Description
Supervised by
Mr. Safayat Bin Hakim
Assistant Professor
Department of Electrical and Electronic Engineering
Islamic University of Technology.
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Electrical and Electronic Engineering, 2022.
Keywords
Machine learning, Phishing, XGBoost, SVM, Preprocessing, Complexity reduction,
Citation
[1] V. Bhavsar, A. Kadlak, and S. Sharma, “Study on phishing attacks,” Int.J. Comput. Appl, vol. 182, pp. 27–29, 2018. [2] F. N. P. Office, “Internet crime complaint center2018 internet crime report,” 2019. [Online]. Available:https://www.fbi.gov/news/pressrel/press-releases/fbi-releasestheinternet-crime-complaint-center-2018-internet-crime-report [3] Verizon, “2021 data breach investigations report,” 2021. [Online].Available: https://www.verizon.com/business/resources/reports/dbir/ [4] ESET, “From crisis response to transformation,” 2020. [Online].Available: https://www.eset.com [5] M. Rosenthal, “Must-know phishing statistics,” 2022. [Online].Available: https://www.tessian.com/blog/phishing-statistics-2020/ [6] SonicWall, “2020 sonicwall cyber threat report: Threat actors pivot toward more targeted attacks, evasive exploits,” 2020.[Online]. Available: https://www.sonicwall.com/news/2020-sonicwallcyber-threat-report/ [7] APWG, “Phishing activity trends reports.” [Online]. Available: https://apwg.org/trendsreports/ 40 [8] A. K. Dutta, “Detecting phishing websites using machine learning technique,” PloS one, vol. 16, no. 10, p. e0258361, 2021. [9] H. Le, Q. Pham, D. Sahoo, and S. C. Hoi, “Urlnet: Learning a url representation with deep learning for malicious url detection,” arXiv preprint arXiv:1802.03162, 2018. [10] I. Corona, B. Biggio, M. Contini, L. Piras, R. Corda, M. Mereu,G. Mureddu, D. Ariu, and F. Roli, “Deltaphish: Detecting phishing webpages in compromised websites,” in European Symposium on Researchin Computer Security. Springer, 2017, pp. 370–388. [11] M. M. Nishat, F. Faisal, T. Hasan, M. F. B. Karim, Z. Islam, andM. R. K. Shagor, “An investigative approach to employ support vectorclassifier as a potential detector of brain cancer from mri dataset,”in 2021 International Conference on Electronics, Communications andInformation Technology (ICECIT). IEEE, 2021, pp. 1–4. [12] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, and C. Wang, “Machine learning and deep learning methods for cybersecurity,”Ieee access, vol. 6, pp. 35 365–35 381, 2018. [13] M. Ahsan, R. Gomes, and A. Denton, “Smote implementation on phishing data to enhance cybersecurity,” in 2018 IEEE International Conference on Electro/Information Technology (EIT). IEEE, 2018, pp. 0531–0536. [14] W. Ali, “Phishing website detection based on supervised machine learning with wrapper features selection,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 9, pp. 72–78, 2017. [15] V. S. Lakshmi and M. Vijaya, “Efficient prediction of phishing websites using supervised learning algorithms,” Procedia Engineering, vol. 30, pp. 798–805, 2012. 41 [16] V. Shahrivari, M. M. Darabi, and M. Izadi, “Phishing detection using machine learning techniques,” arXiv preprint arXiv:2009.11116, 2020. [17] A. A. Ubing, S. K. B. Jasmi, A. Abdullah, N. Jhanjhi, and M. Supramaniam,“Phishing website detection: an improved accuracy through feature selection and ensemble learning,” International Journal of Advanced Computer Science and Applications, vol. 10, no. 1, pp. 252–257, 2019. [18] A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent phishing website detection using random forest classifier,” in 2017 International conference on electrical and computing technologies and applications (ICECTA). IEEE, 2017, pp. 1–5. [19] M. A. U. H. Tahir, S. Asghar, A. Zafar, and S. Gillani, “A hybrid model to detect phishing-sites using supervised learning algorithms,” in 2016 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, 2016, pp. 1126–113 [20] J. Hong, T. Kim, J. Liu, N. Park, and S.-W. Kim, “Phishing url detection with lexical features and blacklisted domains,” in Adaptive Autonomous Secure Cyber Systems. Springer, 2020, pp. 253–2 [21] A. Moubayed, M. Injadat, A. Shami and H. Lutfiyya, "DNS Typo-Squatting Domain Detection: A Data Analytics & Machine Learning Based Approach," 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1-7, doi: 10.1109/GLOCOM.2018.8647679. [22] V. B. et al, “study on phishing attacks,” International Journal of Computer Applications, 2018 42 [23] I.-F. Lam, W.-C. Xiao, S.-C. Wang, and K.-T. Chen, “Counteracting phishing page polymorphism: An image layout analysis approach,” in International Conference on Information Security and Assurance,pp. 270–279, Springer, 2009. [24] K. Krombholz, H. Hobel, M. Huber, and E. Weippl, “Advanced social engineering attacks,” Journal of Information Security and applications,vol. 22, pp. 113–122, 2015. [25] Phishing Websites Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/phishing +websites (Accessed on: 13 May 2022) [26] Hura, & Vyas. (2021). Advances in communication and computational technology. Springer Singapore. [27] https://www.guru99.com/supervised-vs-unsupervised-learning.html (Accessed on: 13 May 2022) [28] Mathanker, S. K., Weckler, P. R., Bowser, T. J., Wang, N., & Maness, N. O. (2011). AdaBoost classifiers for pecan defect classification. Computers and electronics in agriculture, 77(1), 60-68. [29] https://medium.com/almabetter/xgboost-dd38f73233fa (Accessed on: 13 May 2022) [30] https://towardsdatascience.com/quadratic-discriminant-analysis-ae55d8a8148a? gi=a210488bc789 (Accessed on: 13 May 2022) [31] https://towardsdatascience.com/taking-the-confusion-out-of-confusion-matricesc1ce054b3d3e (Accessed on: 13 May 2022) [32] https://www.nottingham.ac.uk/nmp/sonet/rlos/ebp/sensitivity_specificity/page_four.html (Accessed on: 13 May 2022) [33] https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalancedclassification/#:~:text=Precision%20is%20a%20metric%20that,positive%20examples% 20that%20were%20predicted. (Accessed on: 13 May 2022) [34] https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalancedclassification/ (Accessed on: 13 May 2022)