Basakstat - Credit Card Fraud Detection

Credit Card Fraud Detection

What is fraud detection?

Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretences. Fraud detection is applied to many industries such as banking or insurance. In banking, fraud may include forging checks or using stolen credit cards. Other forms of fraud may involve exaggerating losses or causing an accident with the sole intent for the pay-out.

With an unlimited and rising number of ways someone can commit fraud, detection can be difficult to accomplish. Activities such as reorganization, downsizing, moving to new information systems or encountering a cyber-security breach could weaken an organization's ability to detect fraud. This means techniques such as real-time monitoring for frauds is recommended. Organizations should look for fraud in financial transactions, location, devices used, initiated sessions and authentication systems.

Why is fraud detection used?

There are two main reasons:

First, the total cost of chip and pin technology and 3DSecure is relatively high compared to the cost of fraud detection. e.g., while online merchants care about conversion, 3DSecure reduces it by several percents (> 5%). Hence, when they have the option, many online merchants decide to deactivate 3DSecure and manage the risk of payment fraud themselves.

Second, adding more security layers to the buying process greatly reduces checkout velocity and, in turn, convenience for the buyer. While convenience for buyers may look like a fuzzy concept at first, for companies like Amazon, which pioneered one-click checkout, it’s a marketing argument and a means to convert and grow revenues.

Fraud detection techniques

Fraud is typically an act which involves many repeated methods; making searching for patterns a general focus for fraud detection. For example, data analysts can prevent insurance fraud by making algorithms to detect patterns and anomalies.

Fraud detection can be separated by the use of statistical data analysis techniques or artificial intelligence (AI).

Statistical data analysis techniques include the use of:

a) Calculating statistical parameters

b) Regression analysis

c) Probability distributions and models.

d) Data matching

AI techniques used to detect fraud include the use of:

Data mining- Which can classify, group and segment data to search through up to millions of transactions to find patterns and detect fraud.

Neural networks- Which can learn suspicious looking patterns, and use those patterns to detect them further.

Machine learning- Which can automatically identify characteristics found in fraud.

Pattern recognition- Which can detect classes, clusters and patterns of suspicious behavior.

Types of fraud

Fraud can be committed in a number of different ways and in a number of different settings. For example, fraud can be committed in banking, insurance, government and in healthcare sectors.

One common type of fraud in banking is customer account takeover, where someone illegally gains access to a victim’s bank account using bots. Other examples of fraud in banking include the use of malicious applications, the use of false identities, money laundering, credit card fraud and mobile fraud.

Fraud in insurance can include premium diversion fraud, which is the embezzlement of insurance premiums; or frees churning, which is excessive trading by a stockbroker to maximize commissions. Other forms of insurance fraud include asset diversion, workers compensation, car accident, stolen or damaged car, and house fire fraud. The motive behind all insurance fraud is financial profits.

Government fraud is committing fraud against federal agencies such as the departments of Health and Human Services, Transportation, Education, or Energy. Types of government fraud include billing for unnecessary procedures, overcharging for items that cost much less, providing old equipment when billing for new or reporting hours worked for a worker that does not exist.

Healthcare fraud includes drug fraud and medical fraud, as well as encompassing some insurance fraud. Healthcare fraud is committed when someone defrauds an insurer or government health care program.

Imbalanced Data:

This occurs in cases such as credit card fraud detection where there might be only 1000 fraud cases in over a million transactions, representing a meager 0.1% of the dataset. The identification of rare diseases is another possible case of dealing with imbalanced data.

To know more go through this link: https://www.kaggle.com/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets

In this above mentioned link you may able to know about credit card fraud detection through python. It's very easy to understand. Follow up from the beginning.

Example

Credit Card Fraud Detection

The aim of this R project is to build a classifier that can detect credit card fraudulent transactions. We will use a variety of machine learning algorithms that will be able to discern fraudulent from non-fraudulent one. By the end of this machine learning project, you will learn how to implement machine learning algorithms to perform classification.

R code: //// Importing dataset ////library(ranger)library(caret)library(data.table)creditcard_data <- read.csv("C:/Users/Soumyajit/Desktop/Credit Card/creditcard.csv") //// Data Exploration ////dim(creditcard_data)head(creditcard_data,6)tail(creditcard_data,6)table(creditcard_data$Class)summary(creditcard_data$Amount)names(creditcard_data)var(creditcard_data$Amount)sd(creditcard_data$Amount) //// Data Manipulation ////head(creditcard_data)creditcard_data$Amount=scale(creditcard_data$Amount)NewData=creditcard_data[,-c(1)]head(NewData) //// Data Modelling ////library(caTools)set.seed(123)data_sample = sample.split(NewData$Class,SplitRatio=0.80)train_data = subset(NewData,data_sample==TRUE)test_data = subset(NewData,data_sample==FALSE)dim(train_data)dim(test_data) //// Fitting Logistic Regression Model ////Logistic_Model=glm(Class~.,test_data,family=binomial())summary(Logistic_Model)plot(Logistic_Model)library(pROC)lr.predict <- predict(Logistic_Model,train_data, probability = TRUE)auc.gbm = roc(test_data$Class, lr.predict, plot = TRUE, col = "blue") //// Fitting a Decision Tree Model ////library(rpart)library(rpart.plot)decisionTree_model <- rpart(Class ~ . , creditcard_data, method = 'class')predicted_val <- predict(decisionTree_model, creditcard_data, type = 'class')probability <- predict(decisionTree_model, creditcard_data, type = 'prob')rpart.plot(decisionTree_model) //// Artificial Neural Network //// library(neuralnet)ANN_model =neuralnet (Class~.,train_data,linear.output=FALSE)plot(ANN_model)predANN=compute(ANN_model,test_data)resultANN=predANN$net.resultresultANN=ifelse(resultANN>0.5,1,0) //// Gradient Boosting ////# Plot and calculate AUC on test datagbm_test = predict(model_gbm, newdata = test_data, n.trees = gbm.iter)gbm_auc = roc(test_data$Class, gbm_test, plot = TRUE, col = "red")print(gbm_auc)library(gbm, quietly=TRUE)# Get the time to train the GBM modelsystem.time( model_gbm <- gbm(Class ~ . , distribution = "bernoulli" , data = rbind(train_data, test_data) , n.trees = 500 , interaction.depth = 3 , n.minobsinnode = 100 , shrinkage = 0.01 , bag.fraction = 0.5 , train.fraction = nrow(train_data) / (nrow(train_data) + nrow(test_data)) ))# Determine best iteration based on test datagbm.iter = gbm.perf(model_gbm, method = "test")

source:https://data-flair.training/blogs/data-science-machine-learning-project-credit-card-fraud-detection/

Page updated

Google Sites

Report abuse