This is diagram of the container. We put python library(numpy, pandas, sklearn), our dataset(financial fraud dataset) and python code into one single container. By doing this, we do not need to pay attention on configuration and uploading dataset.
This is the link for credit card dataset. You need to agree with the terms and conditions, and require a Gmail account.
https://www.kaggle.com/mlg-ulb/creditcardfraud/download
Dockerfile(Docker file)
lr1.py(python code for logistic regression)
We put all the things into one container and make it become an image
Here is the code for lr1.py
import numpy as npimport pandas as pddataset = pd.read_csv("creditcard.csv",sep = ',')X = dataset.iloc[:,0:30]y = dataset.iloc[:,30]#print(y)y.value_counts()from sklearn.model_selection import train_test_splitX_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.25,random_state = 0)#Feature Scalingfrom sklearn.preprocessing import StandardScalersc_X = StandardScaler()X_train = sc_X.fit_transform(X_train)X_test = sc_X.transform(X_test)from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, roc_auc_score, precision_score#Fitting Logistic Regression to the Training setfrom sklearn.linear_model import LogisticRegressionclassifier = LogisticRegression(random_state = 0,solver='lbfgs')classifier.fit(X_train, y_train)#predicting the test set resultthreshold = 0.1y_pred = np.where(classifier.predict_proba(X_test)[:,1]>threshold,1,0)#print(y_pred)ff=pd.DataFrame(data=[accuracy_score(y_test, y_pred), recall_score(y_test, y_pred), precision_score(y_test, y_pred), roc_auc_score(y_test, y_pred)], index=["accuracy", "recall", "precision", "roc_auc_score"])print(ff)#Making the confusion Matrixfrom sklearn.metrics import confusion_matrixmatrix = confusion_matrix(y_test,y_pred)accuracy = np.trace(matrix) / float(np.sum(matrix))print("Cofusion Matrix")print(matrix)print("The accuracy is: {:.2%}".format(accuracy))Here is text and screenshot in Dockerfile:
FROM python:3
COPY . /home/hao/newdocker
WORKDIR /home/hao/newdocker
RUN pip install numpy
RUN pip install pandas
RUN pip install sklearn
CMD ["python","./lr1.py"]
Now, we already knew what is inside in container. Let us run the image and see the prediction for credit fraud dataset by using logistic regression.
First, we need open the terminal in linux system
enter the following line:
it will pull the image from dockerhub
After that, it will download the image from docker hub. (it might take a while)
This is the picture for pull complete
Use
sudo docker images
to check if it download successfully
After that, use
sudo docker run hzhang13/test
to run this image, it will run the python code which is shown in previous python code screenshot
You will see the result, the accuracy for the prediction for credit fraud dataset by using logistic regression is 99.94%