Hands-on Lab Practice

This is diagram of the container. We put python library(numpy, pandas, sklearn), our dataset(financial fraud dataset) and python code into one single container. By doing this, we do not need to pay attention on configuration and uploading dataset.

This is the link for credit card dataset. You need to agree with the terms and conditions, and require a Gmail account.

https://www.kaggle.com/mlg-ulb/creditcardfraud/download

Dockerfile(Docker file)

lr1.py(python code for logistic regression)

We put all the things into one container and make it become an image

Here is the code for lr1.py

import numpy as np

import pandas as pd

dataset = pd.read_csv("creditcard.csv",sep = ',')

X = dataset.iloc[:,0:30]

y = dataset.iloc[:,30]

#print(y)

y.value_counts()

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.25,random_state = 0)

#Feature Scaling

from sklearn.preprocessing import StandardScaler

sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train)

X_test = sc_X.transform(X_test)

from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, roc_auc_score, precision_score

#Fitting Logistic Regression to the Training set

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression(random_state = 0,solver='lbfgs')

classifier.fit(X_train, y_train)

#predicting the test set result

threshold = 0.1

y_pred = np.where(classifier.predict_proba(X_test)[:,1]>threshold,1,0)

#print(y_pred)

ff=pd.DataFrame(data=[accuracy_score(y_test, y_pred), recall_score(y_test, y_pred),

                   precision_score(y_test, y_pred), roc_auc_score(y_test, y_pred)],

             index=["accuracy", "recall", "precision", "roc_auc_score"])

print(ff)

#Making the confusion Matrix

from sklearn.metrics import confusion_matrix

matrix = confusion_matrix(y_test,y_pred)

accuracy = np.trace(matrix) / float(np.sum(matrix))

print("Cofusion Matrix")

print(matrix)

print("The accuracy is: {:.2%}".format(accuracy))

Here is text and screenshot in Dockerfile:

FROM python:3

COPY . /home/hao/newdocker

WORKDIR /home/hao/newdocker

RUN pip install numpy

RUN pip install pandas

RUN pip install sklearn

CMD ["python","./lr1.py"]

Now, we already knew what is inside in container. Let us run the image and see the prediction for credit fraud dataset by using logistic regression.

First, we need open the terminal in linux system

enter the following line:

sudo docker pull hzhang13/test

it will pull the image from dockerhub

After that, it will download the image from docker hub. (it might take a while)

This is the picture for pull complete

Use

sudo docker images

to check if it download successfully

After that, use

sudo docker run hzhang13/test

to run this image, it will run the python code which is shown in previous python code screenshot

You will see the result, the accuracy for the prediction for credit fraud dataset by using logistic regression is 99.94%

Page updated

Google Sites

Report abuse