Found 82 repositories(showing 30)
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. Update (03/05/2021) A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book. Acknowledgements The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project Please cite the following works: Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019 Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019 Yann-Aël Le Borgne, Gianluca Bontempi Machine Learning for Credit Card Fraud Detection - Practical Handbook
A Jupyter notebook that applies machine learning techniques to detect credit card fraud on imbalanced data. It covers data preprocessing, EDA, handling class imbalance, training classifiers (Logistic Regression, Decision Tree, RandomForest), and saving the trained models.
pankaj614
Problem statement The problem statement chosen for this project is to predict fraudulent credit card transactions with the help of machine learning models. In this project, we will analyse customer-level data which has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group. The dataset is taken from the Kaggle Website website and it has a total of 2,84,807 transactions, out of which 492 are fraudulent. Since the dataset is highly imbalanced, so it needs to be handled before model building. Business Problem Overview For many banks, retaining high profitable customers is the number one business goal. Banking fraud, however, poses a significant threat to this goal for different banks. In terms of substantial financial losses, trust and credibility, this is a concerning issue to both banks and customers alike. It has been estimated by Nilson report that by 2020 the banking frauds would account to $30 billion worldwide. With the rise in digital payment channels, the number of fraudulent transactions is also increasing with new and different ways. In the banking industry, credit card fraud detection using machine learning is not just a trend but a necessity for them to put proactive monitoring and fraud prevention mechanisms in place. Machine learning is helping these institutions to reduce time-consuming manual reviews, costly chargebacks and fees, and denials of legitimate transactions. Understanding and Defining Fraud Credit card fraud is any dishonest act and behaviour to obtain information without the proper authorization from the account holder for financial gain. Among different ways of frauds, Skimming is the most common one, which is the way of duplicating of information located on the magnetic strip of the card. Apart from this, the other ways are: Manipulation/alteration of genuine cards Creation of counterfeit cards Stolen/lost credit cards Fraudulent telemarketing Data Dictionary The dataset can be download using this link The data set includes credit card transactions made by European cardholders over a period of two days in September 2013. Out of a total of 2,84,807 transactions, 492 were fraudulent. This data set is highly unbalanced, with the positive class (frauds) accounting for 0.172% of the total transactions. The data set has also been modified with Principal Component Analysis (PCA) to maintain confidentiality. Apart from ‘time’ and ‘amount’, all the other features (V1, V2, V3, up to V28) are the principal components obtained using PCA. The feature 'time' contains the seconds elapsed between the first transaction in the data set and the subsequent transactions. The feature 'amount' is the transaction amount. The feature 'class' represents class labelling, and it takes the value 1 in cases of fraud and 0 in others. Project Pipeline The project pipeline can be briefly summarized in the following four steps: Data Understanding: Here, we need to load the data and understand the features present in it. This would help us choose the features that we will need for your final model. Exploratory data analytics (EDA): Normally, in this step, we need to perform univariate and bivariate analyses of the data, followed by feature transformations, if necessary. For the current data set, because Gaussian variables are used, we do not need to perform Z-scaling. However, you can check if there is any skewness in the data and try to mitigate it, as it might cause problems during the model-building phase. Train/Test Split: Now we are familiar with the train/test split, which we can perform in order to check the performance of our models with unseen data. Here, for validation, we can use the k-fold cross-validation method. We need to choose an appropriate k value so that the minority class is correctly represented in the test folds. Model-Building/Hyperparameter Tuning: This is the final step at which we can try different models and fine-tune their hyperparameters until we get the desired level of performance on the given dataset. We should try and see if we get a better model by the various sampling techniques. Model Evaluation: We need to evaluate the models using appropriate evaluation metrics. Note that since the data is imbalanced it is is more important to identify which are fraudulent transactions accurately than the non-fraudulent. We need to choose an appropriate evaluation metric which reflects this business goal.
NS-AlgoHub
Credit Card Fraud Classifier is a machine learning project that identifies fraudulent transactions using historical payment data. It focuses on data preprocessing, imbalance handling, feature engineering, and model evaluation for reliable fraud detection.
Context It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase. Content The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise. Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification. Update (03/05/2021) A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book. Acknowledgements The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project Please cite the following works: Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019 Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019 Yann-Aël Le Borgne, Gianluca Bontempi Machine Learning for Credit Card Fraud Detection - Practical Handbook
nodelicious
Credit card fraud is an ever-growing problem in today's financial market. There has been a rapid increase in the rate of fraudulent activities in recent years causing a substantial financial loss to many organizations, companies, and government agencies. The numbers are expected to increase in the future, because of which, many researchers in this field have focused on detecting fraudulent behaviours early using advanced machine learning techniques. However, the credit card fraud detection is not a straightforward task mainly because of two reasons: (i) the fraudulent behaviours usually differ for each attempt and (ii) the dataset is highly imbalanced, i.e., the frequency of majority samples (genuine cases) outnumbers the minority samples (fraudulent cases). When providing input data of a highly unbalanced class distribution to the predictive model, the model tends to be biased towards the majority samples. As a result, it tends to misrepresent a fraudulent transaction as a genuine transaction. To tackle this problem, data-level approach, where different resampling methods such as undersampling, oversampling, and hybrid strategies, have been implemented along with an algorithmic approach where ensemble models such as bagging and boosting have been applied to a highly skewed dataset containing 284807 transactions. Out of these transactions, only 492 transactions are labelled as fraudulent. Predictive models such as logistic regression, random forest, and XGBoost in combination with different resampling techniques have been applied to predict if a transaction is fraudulent or genuine. The performance of the model is evaluated based on recall, precision, f1-score, precision-recall (PR) curve, and receiver operating characteristics (ROC) curve. The experimental results showed that random forest in combination with a hybrid resampling approach of Synthetic Minority Over-sampling Technique (SMOTE)
zeynepsonmeez
Credit card fraud detection using machine learning on imbalanced data
udaykumar-cs
End-to-end FinTech credit card fraud detection system using machine learning pipelines, SMOTE, ensemble models, and probability threshold optimization on highly imbalanced data.
Machine learning project on credit card fraud detection using Decision Tree and Random Forest. Achieved 89% recall and 63% precision on highly imbalanced data with hyperparameter tuning and feature importance analysis.
barkathafreen786
A machine learning-based credit card fraud detection system that uses Random Forest classification and SMOTE to handle imbalanced data. The project includes a Streamlit web app for real-time fraud prediction based on transaction input features.
Rahul6099
This MSc Sustainable Impact Analysis (SIA) programming assignment focuses on developing a machine learning-based credit card fraud detection model using anonymized transaction data. The project involves data preprocessing, exploratory analysis, handling class imbalance, and implementing classification models to accurately identify fraudulent
P-Chandra28
This project focuses on detecting fraudulent credit card transactions using machine learning techniques. It uses a Kaggle dataset containing anonymized data from European cardholders, addressing the challenge of class imbalance common in fraud detection. Several models, including Decision Tree, Random Forest, and AdaBoost, are implemented.
Astro42
Payment fraud represents a significant and growing issue in the world. With the rise in computing platforms, the scale and diversity of credit card fraud have significantly changed. This is due to the rise in both online transactions and e-commerce platforms. Credit card fraud happens when a credit/debit card or card information is stolen, or even when the fraudster uses the information for his/her personal gains. To control these fraudulent activities, fraud detection systems were introduced. But such systems pose operational challenges because the responsibility of the management and cybersecurity would be uncoordinated sometimes. And moreover, the design of such systems is particularly challenging due to the non-stationary distribution of data. The issue most enterprises face here is the lack of incident data, as there is limited information on smaller attacks as in most cases they are not reported thoroughly. Through this project, we aim to implement and assess the performance of various machine learning models on the dataset to successfully predict fraudulent transactions. Since public data are scarce due to confidentiality, the focus of the project is on predictive performance rather than inference. In this project, we use a rich dataset retrieved from Kaggle that contains 284,807 credit card transactions occurring over two days in Europe. It was collected and also analyzed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. The dataset contains over 31 variables with nearly 284,807 credit card transactions. An important attribute of the dataset is that it has been processed to protect cardholder privacy. Because of privacy concerns, we cannot provide the original features and more background information about the data. This suggests that the data is substantially imbalanced. Positive frauds account for 0.172 percent of total transactions. We only have the following features V1 through V28, which are referred to as the primary components, because it involves confidential data. Aside from that, we've been given time and a transaction amount. Another issue to overcome is the dataset's extreme imbalance. With a large number of non-fraudulent transactions in place, Random Undersampling can be used to reduce the number of non-fraudulent transactions and match it to measure the number of fraudulent transactions.
No description available
negar-riazi
Credit Card Fraud Detection using Machine Learning on Imbalanced Data
Vishnu-211297
Credit card fraud detection using machine learning on highly imbalanced data
Nagatragab
Credit card fraud detection using machine learning on highly imbalanced data. (Kaggle Dataset)
pasid-ops
Credit card fraud detection using Federated Learning (FedAvg) with machine learning on imbalanced data.
ChetanRaj13
Credit card fraud detection using machine learning techniques in Python on highly imbalanced transaction data.
kutaydemir462
Credit card fraud detection using SMOTE and machine learning models to improve recall on imbalanced data.
Parmodk2310
Credit card fraud detection using machine learning with a focus on imbalanced data and high accuracy.
bprak04
Machine learning-based fraud detection on imbalanced credit card transaction data using Python and visualization in Tableau
Detect credit card fraud effectively using machine learning on imbalanced data. Explore techniques to address class imbalance and enhance fraud detection accuracy.
syedtaseerabbas797-droid
Machine Learning based Credit Card Fraud Detection using EDA, Random Forest, and Confusion Matrix analysis on highly imbalanced data.
taniajasrotia401
Credit Card Fraud Detection using Python and Machine Learning on highly imbalanced transaction data, evaluated using ROC-AUC and recall.
MirAsimAli
Machine learning project for detecting fraudulent credit card transactions using data analysis and predictive modeling. End-to-end credit card fraud detection using EDA, visualization, and ML models. Imbalanced classification project focused on financial fraud detection with machine learning.
DhruvrajSinhZala24
End-to-end credit card fraud detection using machine learning on highly imbalanced real-world data with Logistic Regression, SMOTE, and Random Forest.
Jyotisingh31
Built a machine learning model to detect credit card fraud using algorithms like logistic regression and random forest. Focused on handling imbalanced data and improving fraud detection accuracy.
This project focuses on detecting fraudulent credit card transactions using machine learning. Using the Credit Card Fraud Detection dataset from Kaggle, we build models to identify suspicious activity in highly imbalanced data.
kumaradityar194-crypto
Developed a credit card fraud detection system using machine learning and deep learning, handling class imbalance and achieving 95% accuracy on test data.