Credit Card Fraud Detection Using Supervised Learning Algorithms


Research Paper (postgraduate), 2020

6 Pages, Grade: 3.16


Excerpt


Abstract . Fraud is one of the most major ethical issues in Credit card industry. The main Purpose of our paper is to identify the Credit card fraud and provide a reasonable Solution to the fraud. Frauds caused by Credit Cards have costs consumers and banks billions of dollars globally. Even after numerous mechanisms to stop fraud, fraudsters are continuously trying to find new ways and tricks to commit fraud. Fraud detection is of immense importance in banking field and finance related companies. We are going to apply artificial neural network for detection purposes. Thus in order to stop it we will provide a solution which will not only detect fraud but will detect it before it happening. Our system will learn from past committed fraud in order to detect new frauds. Mining algorithms had been applied to detect fraud but did not performed well. In our paper we are implementing machine learning algorithms to detect fraud in credit card transactions. The paper utilizes the supervised learning algorithms which are implemented on a dataset from kaggle which was highly skewed and imbalance. We balanced the set by robust scalar to have a 51 percent non fraud cases and 49 percent fraud cases. Logistic regression, random forest, decision tree and KNN has been implemented and further learning curves are displayed which shows which algorithm has the best ability to perform.

The metrics used for evaluation are accuracy, specificity, precision and sensitivity and a comparative chart is established which displays the comparative analysis of these supervised learning algorithms.

Keywords : Neural Network, Genetic Algorithm, Support Vector Machine, Bayesian Network, K- Nearest Neighbor.

1. Introduction:

Use of credit card in today’s world is a common scenario. Often it is used for the purpose of online payments and transactions. There are various uses of Credit card. With the increase in Credit card use the chances of fraud in such transactions has increased ten folds. Credit card fraud causes Billions of dollars’ loss in the world. Fraud is classified as deception in order to gain illegal gains on someone else’s money. There are multiple ways in which credit card fraud can be carried out. By lost or stolen cards, by producing fake or counterfeit cards, by cloning the original site, by erasing or modifying the magnetic strip present at the card which contains the user’s information, by phishing, by skimming or by stealing data from a merchant’s side. Credit card is one of the methods of purchasing goods or services. Fraud detection is basically dividing the transaction between fraudulent and non-fraudulent can enjoy their Shopping or any other transactions easily without any delay. Many detections have been utilized to solve this problem like genetic algorithm, item set mining, migrating birds’ algorithm. The dataset in credit card fraud detection is highly rare and even if it is viable it is highly skewed and imbalance for implementing properly the algorithms. So, few changes are required to be implemented on the dataset before running the algorithms.

2. System design:

Abbildung in dieser Leseprobe nicht enthalten

Figure 1: Use case diagram for fraud detection

Abbildung in dieser Leseprobe nicht enthalten

Figure 2: Flow chart

3. Algorithms used:

3.1 Decision Tree:

This is an algorithm it utilizes a tree like graph and all possible outcomes in order to foretell the final decision, this algorithm uses conditional control statement.

3.2 Logistic regression :

Logistic Regression is very much similar to linear regression but there is one difference as well, in logistic regression curve is obtained and in linear regression straight line is obtained.

Abbildung in dieser Leseprobe nicht enthalten

Figure 3: Logistic curves

3.3 Random Forest :

This is an algorithm for algorithm Regression and classification. Basically Random Forest contains decision tree classifiers. Random Forest resolves the issue of over fitting in training set that’s why it is preferred over decision tree. In order to train each tree, we can randomly sample subset of training set and then a decision tree is built.

3.4 K-Nearest Neighbor Classifier:

KNN is for classification and regression and on the basis of its similarities KNN carries out its classification like Euclidean, Manhattan and Minkowski distance functions. The Euclidean and Manhattan prefer continuous variables, however the Minkowski works well with categorical variables.

4. Credit card fraud detection:

The advent of credit card and the comfort it provides to order anything within the comfort of your house has also bought fraudsters closer to this technology. Credit cards are an easy target because a huge amount of money can be earned in very less time. Transaction products, including credit cards, are the most vulnerable to fraud. On the other hand, other products such as personal loans and retail are also at a high risk.

4.1 Techniques of Credit card fraud detection:

1) Electronic or Manual Credit Card Imprints: When the fraudster skims information that is placed on the magnetic strip of the card.
2) Counterfeit Card Fraud: It is generally attempted through the process of skimming. A fake magnetic swipe card is made and it holds all the details of the original card
3) Card ID Theft: This fraud is similar to application frauds.
4) Account Takeover: It is one of the most common forms of frauds. The fraudster may access the account detail.
5) False Merchant Sites: This is similar to the phishing attack where the customer gets trapped in a fake webpage, created by the fraudster, which looks very similar to a known and genuine website.
6) Vendor charging extra money from the User.
7) Bankruptcy fraud. This section focuses on bankruptcy fraud and advises the use of credit report from credit bureau as a source of information regarding the applicants’ public records as well as a possible implementation of a bankruptcy model.
8) Behavioral fraud: Behavioral fraud occurs when details of legitimate cards have been obtained fraudulently and sales are made on a ‘cardholder present.

5. Credit card detection issues:

There is very less research on real world fraud detection problems low rate of experimental analysis is the root cause that credit card fraud exists up to this age of modern technology. The main problem is of the confidential information which is not being provided by the financial departments to the Researchers to come up with a possible solution. A good classifier must be able to handle complex data because very less amount of transaction is fraudulent. The classifier must be able to determine between correct transactions or fraudulent transactions because many transactions are same. Overall accuracy must be high to detect new types of frauds.

6. Related work:

The Financial industry is taking a huge hit with the credit card fraud being conducted on a huge scale throughout the world. In 1 the researchers are taking part in a research to utilize genetic algorithm in order to let only the genuine customers from being able to obtain the credit card and before buying anything from the online market a classification is performed to detect fraudulent or genuine transaction. The user login and password are also used to detect the transactions. The paper 2 discusses all the possible types of fraud in credit card industry and signifies the available methods to remove fraud from the banking field. Researches used neural network to diminish credit card fraud.

7. Proposed Methodology:

The comparison is made between supervised learning algorithms in the end results displays which algorithm fits best for detection on fraud in credit card transactions and also increase the help of banking departments to diagnose the fraud. So the user does not have to pay for the things he did not buy.

6.1 Algorithm Steps :

Abbildung in dieser Leseprobe nicht enthalten

6.2 Functional requirements:

- The model should be able to give predictions with least errors.
- The representation of the Results obtained.
- The user should be able to enter the values for predictions.

6.3 System requirements:

- Language python 3
- Backend anaconda, Juypter notebook.

8. Experiment and Analysis:

8.1 Data set :

The dataset was obtained from kaggle which has 284,807 transactions of which 492 are fraud ones’ categories as 0 and non-fraud categories as 1. The dataset is skewed and highly imbalanced the first task to scale and sample the dataset in to equal fraud and non-frauds. 99.8 % of data is non-fraud. We could not provide the original features because PCA transformation is performed on them and the attributes are represented with V from V1 to V28. The only available attributes are time and amount. The time is the average time between two transactions and amount is the transaction amount. The class 1 will represent fraud cases and class 0 will represent non fraud cases.

Abbildung in dieser Leseprobe nicht enthalten

Figure 4: Class distribution before sampling

Class distribution of original dataset it is a highly imbalanced data set. 99.83% percent of transactions are non-fraud. Only 0.172% is fraud transactions. If we use the models on this data, we will get a lot of errors and most of the True negative will be missed. The classifier will show accurate results because it will consider fraud as non-fraud over fitting will occur in this scenario. The predictions will show high accuracy rate without detecting fraud cases.

8.2 Results without data sampling :

Abbildung in dieser Leseprobe nicht enthalten

The results have paradox accuracy the paradoxical findings will not predict the result accurately. These results have high accuracy but in real world they are not usable. So we will be sampling minority classes. The underlying problem is the data imbalance.

Firstly, we will scale the column containing Time and Amount as the other column. Furthermore, we will subsample the data so that we can get the equal amount of fraud and non-fraud cases because the original data frame was highly imbalance and can cause issues like over fitting and wrong correlation etc.

Abbildung in dieser Leseprobe nicht enthalten

Figure 5: Distribution of transaction amount

Abbildung in dieser Leseprobe nicht enthalten

Figure 6: Distribution of transaction time

8.3 Scaling the data set:

We will scale the attributes time and amount same as the other columns. A subsample of the data frame is also created to get an equal representation of fraud and non-frauds. The subsample will be a 51 % non-fraud and 49 % fraud distribution. In order to remove over fitting scaling is performed. The scaling is done using robust scalar because it is robust to outliers. After performing robust scalar there will be 492 fraud cases and 492 non-fraud cases. We concatenated the 492 cases of fraud with non-fraud to create a different dataset.

Robust Scalar is not effect by addition of any outlier because robust scalar always generates the same approximation. In the second step we will perform the random under sampling but before the execution the data spilt into test set and training set.

The under sampling is removing the data in order to achieve a more balanced dataset to avoid the over fitting. After under sampling we will have a subsample. In the new subsample we brought 492 non-fraud transactions from 284,315, after getting 50/50 ratios we will implement shuffle and perform the algorithms.

8.4 Under sampling:

In order to get more precise and balanced data under sampling is performed and it also helps in preventing the over fitting. Firstly, we’ll check how much imbalance our data is and after this we will check how many transactions are considered as non-fraud and then we’ll balance data by bringing the fraud ratio to that level that is 50/50 ratio.

Abbildung in dieser Leseprobe nicht enthalten

Figure 7: Classes after equally distribution

Now the data is sampled we can perform our supervised learning algorithms so that we can check the accuracy of our model

8.5 Learning output:

There are greater chances that our model is over fitting as the gap between the training and cross validation score increases which mean that there would be greater variance. Similarly, if both the scores i.e. the training and cross validation score are on lower side, it means our model is under fitting. KNN shows the best score in both sets.

9. Experimentation Results

9.1 Results on sampled data set:

Abbildung in dieser Leseprobe nicht enthalten

9.2 Learning curves:

In our experiment setup to find the transaction we had used the dataset based on real valued dataset but some values had been changed because getting the original dataset for real valued fraud is very difficult as many banks do not provide the data because of security reasons. We are going to take two Datasets from different regions and create a Feed forward neural network and create a comparison between the effectiveness of the neural network technique.

Abbildung in dieser Leseprobe nicht enthalten

Figure 8: KNN learning curve

Abbildung in dieser Leseprobe nicht enthalten

Figure 9: Decision tree learning curve

Abbildung in dieser Leseprobe nicht enthalten

Figure 10: Random forest classifier learning curve

Abbildung in dieser Leseprobe nicht enthalten

Figure 11: Logistic regression learning curve

10. Conclusion:

Many algorithms have been utilized for fraud detection but none can detect 100% fraud still problems exists which we tried to address in our paper. In this paper we used supervised machine learning algorithms to implement credit card fraud detection on the dataset which was available on kaggle. The dataset was highly imbalanced our first task was to sample the dataset. We performed random under sampling on the majority class which was of non-fraud. After getting 50 50 ratio of both fraud and non-fraud we performed our supervised learning algorithms. We created a sub sample of the dataset with equal numbers of fraud and non-fraud. Logistic regression had accuracy if 94.9%, Decision tree accuracy was 91.9% and random forest had an accuracy of 92.9%. KNN performed at 93.9%. Although logistic regression had more accuracy but when the learning curves were plotted it signified that the majority of the algorithm under fit while KNN has the ability only to learn. Hence KNN is better classifier for the credit card detection.

References

S P Maniraj, AdityaSaini (2019), “Credit card fraud detection using machine learning and data sciences”, International journal of engineering research and technology, 8 (9).

Dejan Varmedja, Mirjana Karanovic, Srdjan Sladojevic, Marko Arsenovic, Andras Anderla (2019), “Credit card fraud detection – machine learning methods”, 18th International Symposium, 20 (22) .

Suresh K Shirgave, Chetan J. Awati, Rashmi More, Sonam S. Patil (2019), “A review on credit card fraud detection using machine Learning, International journal of Scientific and technology research, 8 (10).

John O. Awoyemi, Adebayo O. Adetunmbi, Samuel A. Oluwadare, “Credit card fraud detection using machine learning techniques: A comparative analysis”, IEEE.

S. Venkata Suryanarayana, G.N. Balaji, G. Venkateswara Rao, “Machine Learning Approaches for credit card fraud detection”, International journal of engineering and technology, 7.

Lakshmi S V S S, Selvani Deepthi Kavilla (2018), “Machine Learning for credit card fraud detection system”, International Journal of Applied Engineering Research, 13.

Samanesh Sorournejad, Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi, “A survey of credit card fraud detection techniques: Data and technique oriented perspective”

Shiv Shankar Singh, (2019) “Electronic credit card fraud detection system by collaboration of machine learning models”, International journal of innovative technology and Exploring Engineering, 8 (12S).

[...]

Excerpt out of 6 pages

Details

Title
Credit Card Fraud Detection Using Supervised Learning Algorithms
College
University of Engineering & Technology, Lahore  (Lahore garrison university)
Course
ISD
Grade
3.16
Authors
Year
2020
Pages
6
Catalog Number
V934479
ISBN (eBook)
9783346254160
Language
English
Keywords
machine learning
Quote paper
Daniyal Baig (Author)Muhammad Farrukh Nadeem (Author), 2020, Credit Card Fraud Detection Using Supervised Learning Algorithms, Munich, GRIN Verlag, https://www.grin.com/document/934479

Comments

  • No comments yet.
Look inside the ebook
Title: Credit Card Fraud Detection Using Supervised Learning Algorithms



Upload papers

Your term paper / thesis:

- Publication as eBook and book
- High royalties for the sales
- Completely free - with ISBN
- It only takes five minutes
- Every paper finds readers

Publish now - it's free