Payment fraud represents a significant and growing issue in the world. With the rise in computing platforms, the scale and diversity of credit card fraud have significantly changed. This is due to the rise in both online transactions and e-commerce platforms. Credit card fraud happens when a credit/debit card or card information is stolen, or even when the fraudster uses the information for his/her personal gains. To control these fraudulent activities, fraud detection systems were introduced. But such systems pose operational challenges because the responsibility of the management and cybersecurity would be uncoordinated sometimes. And moreover, the design of such systems is particularly challenging due to the non-stationary distribution of data. The issue most enterprises face here is the lack of incident data, as there is limited information on smaller attacks as in most cases they are not reported thoroughly. Through this project, we aim to implement and assess the performance of various machine learning models on the dataset to successfully predict fraudulent transactions. Since public data are scarce due to confidentiality, the focus of the project is on predictive performance rather than inference. In this project, we use a rich dataset retrieved from Kaggle that contains 284,807 credit card transactions occurring over two days in Europe. It was collected and also analyzed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. The dataset contains over 31 variables with nearly 284,807 credit card transactions. An important attribute of the dataset is that it has been processed to protect cardholder privacy. Because of privacy concerns, we cannot provide the original features and more background information about the data. This suggests that the data is substantially imbalanced. Positive frauds account for 0.172 percent of total transactions. We only have the following features V1 through V28, which are referred to as the primary components, because it involves confidential data. Aside from that, we've been given time and a transaction amount. Another issue to overcome is the dataset's extreme imbalance. With a large number of non-fraudulent transactions in place, Random Undersampling can be used to reduce the number of non-fraudulent transactions and match it to measure the number of fraudulent transactions.
Stars
1
Forks
0
Watchers
1
Open Issues
0
Overall repository health assessment
No language data available
No package.json found
This might not be a Node.js project
1
commits