This is diagram of the container. We put python library(numpy, pandas, sklearn), our dataset(financial fraud dataset) and python code into one single container. By doing this, we do not need to pay attention on configuration and uploading dataset.
This is the link for credit card dataset. You need to agree with the terms and conditions, and require a Gmail account.
https://www.kaggle.com/mlg-ulb/creditcardfraud/download
Dockerfile(Docker file)
lr1.py(python code for logistic regression)
We put all the things into one container and make it become an image
Here is the code for lr1.py
import numpy as np
import pandas as pd
dataset = pd.read_csv("creditcard.csv",sep = ',')
X = dataset.iloc[:,0:30]
y = dataset.iloc[:,30]
#print(y)
y.value_counts()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.25,random_state = 0)
#Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, roc_auc_score, precision_score
#Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0,solver='lbfgs')
classifier.fit(X_train, y_train)
#predicting the test set result
threshold = 0.1
y_pred = np.where(classifier.predict_proba(X_test)[:,1]>threshold,1,0)
#print(y_pred)
ff=pd.DataFrame(data=[accuracy_score(y_test, y_pred), recall_score(y_test, y_pred),
precision_score(y_test, y_pred), roc_auc_score(y_test, y_pred)],
index=["accuracy", "recall", "precision", "roc_auc_score"])
print(ff)
#Making the confusion Matrix
from sklearn.metrics import confusion_matrix
matrix = confusion_matrix(y_test,y_pred)
accuracy = np.trace(matrix) / float(np.sum(matrix))
print("Cofusion Matrix")
print(matrix)
print("The accuracy is: {:.2%}".format(accuracy))
Here is text and screenshot in Dockerfile:
FROM python:3
COPY . /home/hao/newdocker
WORKDIR /home/hao/newdocker
RUN pip install numpy
RUN pip install pandas
RUN pip install sklearn
CMD ["python","./lr1.py"]
Now, we already knew what is inside in container. Let us run the image and see the prediction for credit fraud dataset by using logistic regression.
First, we need open the terminal in linux system
enter the following line:
it will pull the image from dockerhub
After that, it will download the image from docker hub. (it might take a while)
This is the picture for pull complete
Use
sudo docker images
to check if it download successfully
After that, use
sudo docker run hzhang13/test
to run this image, it will run the python code which is shown in previous python code screenshot
You will see the result, the accuracy for the prediction for credit fraud dataset by using logistic regression is 99.94%