Adam Sahn Machine Learning for Credit Card Fraud Detection Abstract: Credit card fraud detection models aim to detect fraudulent transactions from credit cards and prevent them from occurring. The real-world data for this situation is especially unique and challenging because it is an incredibly large time-series dataset where consumer privacy is a major concern and almost all transactions are legitimate (class = 0), but the fraudulent transactions (class = 1) are especially important to identify. Additionally, fraudsters are motivated to evolve their tactics to hide their activity, making prediction even more variable and difficult. Multiple machine learning researchers have hypothesized and developed different models to tackle this unique and targeted task. I found that the most effective decision model architecture incorporates LSTM layers (RNN neural network design) with a preceding attention layer to identify the most important temporal relationships, as this model has perfect prediction performance (in a limited dataset) and can evolve over time while still remembering important past patterns. Additional ideas are hypothesized to further improve this model and the fraud detection process as a whole through additional evolution and collaboration. Introduction: Credit card use has exploded in recent years, with Visa and Mastercard having 2.287 billion active card accounts in 2020 [2]. While the boom of cards benefits consumers, it also incentivizes bad actors to commit fraud, which is any criminal deception with the intent of acquiring financial gain [7]. In regards to credit cards, this typically refers to an unauthorized individual using another person’s credit card account to make purchases either online or at a store, with online fraudulent activity being even more difficult to detect [9]. In 2021 alone, global losses due to credit card fraud totaled $32.24 trillion [7]. Identifying credit card fraud is a binary classification task, in which the response outcome may be 0 (legitimate transaction) or 1 (fraudulent transaction). When a credit card company’s algorithm detects fraud, this instigates a manual process to either confirm a transaction as fraudulent, blocking it, or confirm its authenticity, allowing it to go through. In this paper, I compare traditional non-network based machine learning models (logistic regression, KNN, naive Bayes, tree-based models) to neural network methods (CNN, RNN, GRU) for credit card fraud detection. While traditional models can be effective, they struggle with two primary challenges: handling time-series data and adapting to concept drift. Credit card transactions form large, complex time-series datasets that capture cardholder activity over time. Traditional models like logistic regression assume independence between observations and therefore fail to capture sequential relationships. While KNN and random forest models can handle larger datasets, they become computationally inefficient as transaction volume grows and also struggle to extract temporal relationships. Neural networks, particularly RNN models like LSTMs and GRUs, are specifically designed to process sequential data by maintaining hidden states that capture long-term dependencies. This allows them to identify subtle behavioral patterns that may indicate fraudulent activity [1]. Additionally, attention-based model layers further improve on RNNs by dynamically weighing past transactions, enabling improved fraud detection which recognizes past fraud patterns while also paying close attention to recent trends [6]. Another major challenge is concept drift, which occurs when the statistical properties of data shift over time, leading to reduced model accuracy [5]. This is particularly relevant because fraudsters will modify their behavior over time in an attempt to avoid detection, increasing the difficulty of identifying them over time. Traditional models thus require frequent tuning, retraining, and manual updating to remain relevant. On the other hand, neural networks are better suited to handling evolving fraud patterns through continuous learning techniques such as transfer learning [5]. Methods: The foundation of every successful data-driven project is access to a robust dataset. However, researchers frequently encounter challenges in replicating real-world credit card data due to limited data availability, as customer transaction data is private. Among various datasets used in studies, the European card dataset is the most prevalent. It consists of 284,807 real (but anonymized) transactions with 28 predictor variables, of which only 0.17% (492 transactions) are fraudulent. Most studies utilize this dataset as a benchmark, typically using a 70% training, 15% validation, and 15% test split, allowing for future direct comparison across models. While these models are trained on this limited dataset, most are created in a manner such that they would be ready to take on more data efficiently in order to be realistic. Preprocessing steps are crucial for efficient fraud detection, as models must operate in real-time while handling high transaction volumes. The first step is dimensionality reduction. Research has shown that PCA is ineffective for capturing the nonlinear relationships between credit card predictor data. While t-SNE can model complex structures, it is computationally expensive due to its reliance on pairwise probability calculations. In contrast, UMAP preserves both local and global data relationships more effectively while being significantly more efficient, making it the preferred dimensionality reduction technique [1]. Another critical preprocessing step is addressing class imbalance. With only 0.17% of transactions classified as fraudulent, an unadjusted model would likely predict all transactions as legitimate, achieving an accuracy of approximately 99.83%, but failing entirely in detecting fraud (0% recall). Since recall is a priority in this task, most studies implement undersampling, oversampling, or a hybrid approach of both to rebalance the classes. Some papers employ SMOTE, which synthetically generates new fraudulent transactions based on existing class 1 observations, improving the model’s ability to recognize fraud [2]. These techniques guide models to focus more on fraudulent transactions, therefore improving recall. Following preprocessing, studies explore various models. Awoyemi et al. experimented with Naive Bayes, Random Forest, and KNN models, demonstrating strong performance [7]. However, research by Sulaiman and Tiwari et al. highlights that while these methods can perform well on small, structured datasets, they fail to scale effectively for real-world applications that involve high transaction volumes and evolving fraud patterns [2, 10]. Thus, traditional models alone are insufficient for effective fraud detection at scale, regardless of results within a contained experiment. To address these challenges, researchers turn to neural networks. Kang Fu proposed an innovative CNN-based model that reshapes credit card data into a 2D feature matrix, where rows represent predictor variables (more similar/related predictor values are put adjacent to each other) and columns correspond to time windows, allowing for more complex data capture and enhanced pattern recognition within the feature matrix [4]. However, since this study does not use the European card dataset and lacks specific performance metrics, its real-world applicability remains uncertain. While Fu’s graph suggests superior performance compared to ANN, SVM, and Random Forest, a lot of specific information is omitted so the validity of these claims is difficult to verify. Forough et al. introduced RNN-based models with LSTM and GRU layers, designed to capture the temporal dependencies in credit card transaction data [8]. While this design addresses some limitations of traditional models by being built for time series and pattern recognition, standard LSTMs still face challenges. Specifically, LSTMs must encode an entire input sequence into a single context vector, which can cause information loss as sequences grow longer. Additionally, traditional LSTMs process all sequence elements with equal weight, lacking the ability to emphasize more relevant transactions [5]. To overcome these weaknesses, Benchaji and Roseline incorporated an attention mechanism before the LSTM layers, greatly improving model performance [1,6]. Attention mechanisms dynamically assign different weights to input elements, allowing the model to prioritize certain transactions over others. For fraud detection, this means the model can focus more on suspicious transactions while reducing the influence of routine legitimate ones, leading to improved accuracy and recall. By selectively attending to critical patterns, attention-based LSTMs significantly enhance the ability to detect evolving fraudulent behaviors. Results and Discussion: As mentioned, many of the studies built models around the European credit card dataset from 2013, so I will aggregate the relevant metrics from different models on this dataset below. Accuracy Recall Precision Logistic Regression [7] .5486 .5833 .3836 Naive Bayes [7] .9769 .9514 .9786 KNN (k = 3) [7] .9792 .9375 1.0 LSTM [8] N/A .7408 .8575 GRU [8] N/A .7208 .8626 LSTM w. Attention Layer #1 [1] .9672 .9191 .9885 LSTM w. Attention Layer #2 [6] 1.0 1.0 1.0 While these three evaluation metrics are all important, we see that all models (except logistic regression, which would have done well in accuracy except it was oversampled and undersampled along with the other models) performed well in accuracy, which is no feat due to the major class imbalance. Thus, I will focus this analysis on differences across recall and precision. The most critical evaluation metric in this task is recall, as a false negative implies successful fraud which loses money and erodes customer trust, while a false positive implies a temporary transaction blockage which can be overridden by a customer who may be mildly annoyed but also grateful for their cautious banking partner. Precision is also important, as excessive false positives can burden manual review teams and frustrate customers. A successful model maximizes recall while maintaining a relatively high precision value as well. These data confirm my observations from the prior section. Logistic regression is by far the least effective classification technique (accuracy, recall, and precision all below .6) for this task because it is not well-equipped for the time series data. Impressively, Awoyemi et al. built strongly performing Naive Bayes (recall = .9514, precision = .9786) and KNN (recall = .9375, precision = 1.0) models through undersampling and oversampling that converted the class imbalance from .0017 to 66:34. Naive Bayes was especially high-performing because it assumes feature independence, which, when combined with effective resampling techniques, allowed it to capture key fraud-related patterns despite the class imbalance. Additionally, its probabilistic nature made it highly sensitive to rare fraudulent transactions, improving recall without sacrificing precision. While impressive, we must look beyond performance metrics and acknowledge that these models would not be effective in real-world scenarios due to data scale and concept drift (model must be able to effectively evolve) [3]. Forough theorized the strength of an RNN network in this context, creating models with both LSTM layers (recall = .7408, precision = .8575) and GRU layers (recall = .7208, recall = .8626) model [8]. While this should have logically led to improved performance, it clearly did not. I theorize that this is because LSTMs and GRUs require large amounts of well-structured sequential data to learn temporal dependencies effectively, but the limited data in this dataset may have hindered their ability to generalize. This is especially an issue for identifying earlier frauds, when there is not much previous data to begin learning a pattern from. Additionally, improper or unoptimized network design may have hindered performance. Utilizing an attention layer preceding the LSTM layers, two researchers built improved models that significantly outperformed others, including one that achieved perfect prediction performance with accuracy, recall, and precision all equal to 1.0 [6]. The attention mechanism allowed the model to focus on the most relevant parts of the input sequence, effectively capturing subtle fraud-related patterns that may have been overlooked by traditional RNNs. By assigning different weights to each time step, the attention layer enabled the model to better handle long-range dependencies, prioritize rare fraudulent transactions, and improve generalization. While the risk of overfitting remains a concern due to the lack of detail on the train/test split, the attention-enhanced RNN model outperformed Naive Bayes, SVM, and ANN models developed by the same researcher, demonstrating its strong predictive ability. To sum up what all of these results mean for the effectiveness of different models in identifying credit card fraud, logistic regression is ineffective. KNN and Naive Bayes yield strong performance metrics, but logically would not be applicable to a real-world scenario. RNN-based models are the best performers due to their ability to find temporal relationships in time series data, but the performance of standalone LSTM and GRU-based models was incredibly underwhelming. However, this changed when adding an attention layer before the LSTM layers, allowing the model to zoom in on important transactions of interest and ignore clearly legitimate ones, speeding up efficiency and allowing the model to find complex relationships and even predict with perfect performance. Going forward, RNNs with attention layers are the clear best option to predict credit card fraud due to their ability to identify temporal patterns (both short and long term) and adapt to changing conditions. Conclusion: This literature review dove into the various methodologies of using machine learning to identify credit card fraud, specifically comparing non-network (logistic regression, KNN, Naive Bayes) models to neural network machine learning models (CNNs, especially RNNs and modified RNNs). It is clear that an RNN model with a preceding attention layer is best suited for both current fraud detection and the ability to evolve to detect future patterns. Additionally, Carcillo hypothesizes a ‘sliding-window’ approach along with various other optimizations would allow a model to further analyze an incoming rate of 200 transactions/second (up from 2.4/second), greatly increasing efficiency [3]. Finally, Sulamain suggests an innovative process where a centralized server sends individual banks a fraud detection algorithm with trained weights, then these individual institutions run the model which adapts based on the fraudulent transactions their customers face, and then each bank entity sends their improved and further trained algorithm back to the central server, where each unique models is aggregated [2]. This process is repeated indefinitely. Assuming an RNN-based architecture is used, this would allow each bank’s algorithm to detect fraud patterns faced by other institutions and allow the model to evolve efficiently without sharing customer data. By implementing RNN with attention layers and continuing to improve upon efficiency and model performance while also guaranteeing customer privacy, data scientists can help to eradicate credit card fraud, one temporal relationship at a time! References: 1. Benchaji, I., Douzi, S., El Ouahidi, B. et al. Enhanced credit card fraud detection based on attention mechanism and LSTM deep model. J Big Data 8, 151 (2021). https://doi.org/10.1186/s40537-021-00541-8 2. Bin Sulaiman, R., Schetinin, V. & Sant, P. Review of Machine Learning Approach on Credit Card Fraud Detection. Hum-Cent Intell Syst 2, 55–68 (2022). https://doi.org/10.1007/s44230-022-00004-0 3. Fabrizio Carcillo, Andrea Dal Pozzolo, Yann-Aël Le Borgne, Olivier Caelen, Yannis Mazzer, Gianluca Bontempi, SCARFF: A scalable framework for streaming credit card fraud detection with spark, Information Fusion, Volume 41, 2018, Pages 182-194, ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2017.09.005. 4. Fu, K., Cheng, D., Tu, Y., Zhang, L. (2016). Credit Card Fraud Detection Using Convolutional Neural Networks. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9949. Springer, Cham. https://doi.org/10.1007/978-3-319-46675-0_53 5. I. D. Mienye and N. Jere, "Deep Learning for Credit Card Fraud Detection: A Review of Algorithms, Challenges, and Solutions," in IEEE Access, vol. 12, pp. 96893-96910, 2024, doi: 10.1109/ACCESS.2024.3426955. 6. J Femila Roseline, GBSR Naidu, V. Samuthira Pandi, S Alamelu alias Rajasree, Dr.N. Mageswari, Autonomous credit card fraud detection using machine learning approach☆, Computers and Electrical Engineering, Volume 102, 2022, 108132, ISSN 0045-7906, https://doi.org/10.1016/j.compeleceng.2022.108132. 7. J. O. Awoyemi, A. O. Adetunmbi and S. A. Oluwadare, "Credit card fraud detection using machine learning techniques: A comparative analysis," 2017 International Conference on Computing Networking and Informatics (ICCNI), Lagos, Nigeria, 2017, pp. 1-9, doi: 10.1109/ICCNI.2017.8123782 8. Javad Forough, Saeedeh Momtazi,Ensemble of deep sequential models for credit card fraud detection, Applied Soft Computing, Volume 99, 2021, 106883, ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2020.106883. 9. Johannes Jurgovsky, Michael Granitzer, Konstantin Ziegler, Sylvie Calabretto, Pierre-Edouard Portier, Liyun He-Guelton, Olivier Caelen, Sequence classification for credit-card fraud detection, Expert Systems with Applications, Volume 100, 2018, Pages 234-245, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2018.01.037. 10.P. Tiwari, S. Mehta, N. Sakhuja, J. Kumar, and A. K. Singh, "Credit card fraud detection using machine learning: A study," arXiv preprint, arXiv:2108.10005, 2021. https://doi.org/10.48550/arXiv.2108.10005
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )