Transactions on Data Analysis in Social Science

Transactions on Data Analysis in Social Science

An Intelligent Method for Credit Card Fraud Detection Using Data Mining Techniques

Document Type : Original Article

Authors
Department of Information Technology Engineering, Faculty of Computer Science, Raja University, Qazvin, Iran
Abstract
In recent years, billions of dollars in losses have been caused by fraudulent credit card transactions, representing a serious and growing problem. To mitigate the damage from such transactions, data mining techniques are widely employed for credit card fraud detection, utilizing approaches such as classification and clustering with machine learning algorithms. This study focuses on supervised learning, in which machine learning algorithms are trained on labeled datasets to construct predictive models. A major challenge in this domain is the highly imbalanced class distribution within the datasets, as the number of fraudulent transactions is significantly lower than that of legitimate transactions. This paper examines strategies for addressing imbalanced data in machine learning algorithms and proposes an optimized method for detecting and identifying fraud on both original and balanced datasets. In this study, the performance of classification techniques is compared using well-known methods, including C5.0 decision trees, Support Vector Machines (SVM) with Sigmoid, Linear, and RBF kernels, and neural networks.
Keywords

[1]      Nasiri, N., Minayi, B., & Farjami, Y. (2010). Application of data mining methods in electronic banking for detecting suspicious financial transactions [in Persian]. Qom University – Faculty of Engineering.
[2]      Chye Koh, H., & Kee Low, C. (2004). Going concern prediction using data mining techniques. Managerial Auditing Journal, 19(3), 462–476. https://doi.org/10.1108/02686900410524436
[3]      Bolton, R. J., & Hand, D. J. (2002). Statistical fraud detection: A review. Statistical Science, 17(3), 235–249. https://doi.org/10.1214/ss/1042727940
[4]      Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications. Academic Press.
[5]      Dal Pozzolo, A. (2015). Adaptive machine learning for credit card fraud detection (Doctoral dissertation).
[6]      Bolton, R. J., & Hand, D. J. (2001). Unsupervised profiling methods for fraud detection. Credit Scoring and Credit Control VII, 235–255.
[7]      Zaslavsky, V., & Strizhak, A. (2006). Credit card fraud detection using self-organizing maps. Information & Security: An International Journal, 18(3), 48–63. https://doi.org/10.11610/isij.1803
[8]      Vala, H. M., & Nejad, D. F. (2015). Detecting fraud in banking transactions using data mining: A case study on Mehr Eqtesad Bank transactions [in Persian].
[9]      Dal Pozzolo, A., Caelen, O., Johnson, R. A., & Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10), 4915–4928. https://doi.org/10.1016/j.eswa.2014.02.026
[10]   Dal Pozzolo, A., et al. (2015). Calibrating probability with undersampling for unbalanced classification. 2015 IEEE Symposium Series on Computational Intelligence, 33–40. IEEE. https://doi.org/10.1109/SSCI.2015.33
[11]   Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–449. https://doi.org/10.3233/IDA-2002-6504
[12]   Batista, G. E., Carvalho, A. C., & Monard, M. C. (2000). Applying one-sided selection to unbalanced datasets. In Mexican International Conference on Artificial Intelligence (pp. 315–325). Springer. https://doi.org/10.1007/10720076_29
[13]   Holte, R. C., Acker, L., & Porter, B. W. (1989). Concept learning and the problem of small disjuncts. In IJCAI (pp. 813–818).
[14]   Ghosh, S., & Reilly, D. L. (1994). Credit card fraud detection with a neural network. In Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences (pp. 621–630). IEEE. https://doi.org/10.1109/HICSS.1994.323314
[15]   Malini, N., & Pushpa, M. (2017). Analysis on credit card fraud identification techniques based on KNN and outlier detection. 2017 Third International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB) (pp. 255–259). IEEE. https://doi.org/10.1109/AEEICB.2017.7972424
[16]   Lepoivre, M. R., et al. (2016). Credit card fraud detection with unsupervised algorithms. Journal of Advances in Information Technology, 7(1), 34–38. https://doi.org/10.12720/jait.7.1.34-38
[17]   Bhattacharyya, S., et al. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602–613. https://doi.org/10.1016/j.dss.2010.08.008
[18]   Dorronsoro, J. R., et al. (1997). Neural fraud detection in credit card operations. IEEE Transactions on Neural Networks, 8(4), 827–834. https://doi.org/10.1109/72.595879
[19]   Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283. https://doi.org/10.1023/A:1022641700528
[20]   Cohen, W. W. (1995). Fast effective rule induction. In Machine Learning Proceedings 1995 (pp. 115–123). Elsevier.
https://doi.org/10.1016/B978-1-55860-377-6.50023-2
[21]   Quinlan, J. R. (2014). C4.5: Programs for machine learning. Elsevier.
[22]   Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth International Group.
[23]   Shen, A., Tong, R., & Deng, Y. (2007). Application of classification models on credit card fraud detection. In 2007 International Conference on Service Systems and Service Management (pp. 1–4). IEEE. https://doi.org/10.1109/ICSSSM.2007.4280163
[24]   Aleskerov, E., Freisleben, B., & Rao, B. (1997). Cardwatch: A neural network based database mining system for credit card fraud detection. In Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr) (pp. 220–226). IEEE.
[25]   Brause, R., Langsdorf, T., & Hepp, M. (1999). Neural data mining for credit card fraud detection. In Proceedings 11th International Conference on Tools with Artificial Intelligence (pp. 103–106). IEEE.
[26]   Maes, S., et al. (2002). Credit card fraud detection using Bayesian and neural networks. Proceedings of the 1st International Naiso Congress on Neuro Fuzzy Technologies.
[27]   Syeda, M., Zhang, Y.-Q., & Pan, Y. (2002). Parallel granular neural networks for fast credit card fraud detection. IEEE World Congress on Computational Intelligence (FUZZ-IEEE’02). https://doi.org/10.1109/FUZZ.2002.1007255
[28]   Şahin, Y. G., & Duman, E. (2011). Detecting credit card fraud by decision trees and support vector machines. 2011 International Symposium on Innovations in Intelligent Systems and Applications, 594–598. https://doi.org/10.1109/INISTA.2011.5946108
[29]   Ravale, U., Marathe, N., & Padiya, P. (2015). Feature selection based hybrid anomaly intrusion detection system using K-means and RBF kernel function. Procedia Computer Science, 45, 428–435. https://doi.org/10.1016/j.procs.2015.03.174
[30]   Krupka, T. (2016). SVM classifiers and heuristics for feature selection.
[31]   Romero, R., Iglesias, E., & Borrajo, L. (2015). A linear-RBF multikernel SVM to classify big text corpora. BioMed Research International, 2015, 878291. https://doi.org/10.1155/2015/878291
[32]   Singh, G., et al. (2012). A machine learning approach for detection of fraud based on SVM. International Journal of Scientific Engineering and Technology, 1(3), 194–198.
[33]   Rout, N., Mishra, D., & Mallick, M. K. (2018). Handling imbalanced data: A survey. In International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications (pp. 431–443). Springer. https://doi.org/10.1007/978-981-10-5272-9_39
[34]   López, V., et al. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250, 113–141. https://doi.org/10.1016/j.ins.2013.07.007
[35]   Tahir, M. A., et al. (2009). A multiple expert approach to the class imbalance problem using inverse random undersampling. In International Workshop on Multiple Classifier Systems (pp. 82–91). Springer. https://doi.org/10.1007/978-3-642-02326-2_9
[36]   Chawla, N. V., et al. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
[37]   He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) (pp. 1322–1328). IEEE. https://doi.org/10.1109/IJCNN.2008.4633969
[38]   Ganesh Kumar, R. D., Mohan, K. R., Jagan Mohan, R., & Chakraborty, G. (2016). Predicting rare events using specialized sampling techniques in SAS (p. 7).
[39]   Rokach, L., & Maimon, O. Z. (2008). Data mining with decision trees: Theory and applications (Vol. 69). World Scientific. https://doi.org/10.1142/9789812771728
[40]   Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
[41]   Bahnsen, A. C., Aouada, D., & Ottersten, B. (2015). Example-dependent cost-sensitive decision trees. Expert Systems with Applications, 42(19), 6609–6619. https://doi.org/10.1016/j.eswa.2015.04.042
[42]   Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the detection of fraudulent financial statements. Expert Systems with Applications, 32(4), 995–1003. https://doi.org/10.1016/j.eswa.2006.02.016
[43]   Vapnik, V. N. (1998). Statistical learning theory. Wiley.
[44]   Kecman, V. (2001). Learning and soft computing: Support vector machines, neural networks, and fuzzy logic models. MIT Press.
[45]   Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1023/A:1022627411411
[46]   Phienthrakul, T., & Kijsirikul, B. (2005). Evolutionary strategies for multi-scale radial basis function kernels in support vector machines. In Proceedings of the 7th Annual Conference on Genetic and Evolutionary Computation (pp. 987–993). ACM. https://doi.org/10.1145/1068009.1068160
[47]   Schölkopf, B., Smola, A. J., & Bach, F. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT Press. https://doi.org/10.7551/mitpress/4175.001.0001
[48]   Chaudhary, K., Yadav, J., & Mallick, B. (2012). A review of credit card fraud detection techniques. International Journal of Computer Applications, 45(1), 39–44.
[49]   Chen, W. H., Hsu, S. H., & Shen, H. P. (2005). Application of SVM and ANN for intrusion detection. Computers & Operations Research, 32(10), 2617–2634. https://doi.org/10.1016/j.cor.2004.03.019
[50]   Anohhin, I., Võhandu, L., & Emeritus, P. (2017). Data mining and machine learning for fraud detection (Master’s thesis). Tallinn University of Technology, Faculty of Information Technology.
[51]   Beitzel, S. (2006). On understanding and classifying web queries (PhD thesis). Illinois Institute of Technology.
[52]   Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. https://doi.org/10.1016/j.eswa.2011.09.033
Volume 2, Issue 4
Autumn 2020
Pages 180-196

  • Receive Date 21 July 2020
  • Revise Date 04 October 2020
  • Accept Date 20 November 2020