Implement logistic regression scoring without scikit-learn

Post date: Oct 29, 2016 5:21:36 PM

I build a scoring system using logistic regression on AWS lambda. However, it's challenging to install numpy or scikit-learn packages on lambda (at the time of writing this). So, a good way is to train the logistic regression model offline using scikit-learn and send the parameters (weights + bias) to use in the lambda containers. Here is what I did offline.

This example is for binary classification on Iris dataset. I use label 0 and 1 and discard the rest.

The final score of binary classification using logistic regression is simply

sigmoid(w^tx) = 1/(1 + exp(-w^tx))

where w is [weights, bias]. Refer to section logistic on this url for more details.

##################################################################

# Modeling

##################################################################

# Logistic Regression

from sklearn import datasets

from sklearn import metrics

from sklearn.linear_model import LogisticRegression

from math import exp, log

# load the iris datasets

dataset = datasets.load_iris()

target = dataset.target[dataset.target < 2]

data = dataset.data[dataset.target < 2]

# fit a logistic regression model to the data

model = LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)

model.fit(data, target)

print(model)

# make predictions

expected = target

predicted = model.predict(data)

# summarize the fit of the model

print(metrics.classification_report(expected, predicted))

print(metrics.confusion_matrix(expected, predicted))

# Output prediction probability

model.predict_proba(data)

# array([[ 0.98390917,  0.01609083],

#        [ 0.9644754 ,  0.0355246 ],

#        [ 0.97670816,  0.02329184],

#        [ 0.95694433,  0.04305567],

#        [ 0.98549492,  0.01450508],

#        [ 0.98103758,  0.01896242],

#        [ 0.97497879,  0.02502121],

#        [ 0.97593072,  0.02406928],

#        [ 0.95033942,  0.04966058],

#        [ 0.9652527 ,  0.0347473 ],

#        [ 0.98666133,  0.01333867],

#        [ 0.96761024,  0.03238976],

#        [ 0.96647839,  0.03352161],

#        [ 0.97874802,  0.02125198],

#        [ 0.9962299 ,  0.0037701 ],

#        [ 0.99476333,  0.00523667],

#        [ 0.99217185,  0.00782815],

#        [ 0.98223225,  0.01776775],

#        [ 0.98241722,  0.01758278],

#        [ 0.98560983,  0.01439017],

#        [ 0.96824309,  0.03175691],

#        [ 0.98165317,  0.01834683],

#        [ 0.99297853,  0.00702147],

# Now I will get the weights + bias from the model

w = model.coef_

c = model.intercept_

# using numpy

import numpy as np

def sigmoid(x,w,c):

    return 1/(1+np.exp(-1*np.inner(w,x)-c))

# This will work without numpy, so this is AWS-lambda-friendly.

def sigmoid(x,w,c):

    return  1.0/(1 + exp(-1*sum([xi*wi for xi,wi in zip(x,w)])-c))

for i in range(data.shape[0]):

    print sigmoid(data[i].tolist(),w.tolist()[0],c[0])

# 0.016090826738

# 0.0355246004109

# 0.0232918385259

# 0.0430556690477

# 0.0145050823625

# 0.0189624168471

# 0.0250212147231

# 0.0240692765975

# 0.0496605772698

# 0.0347473007216

# 0.0133386701762

# 0.0323897551789

# 0.0335216095562

# 0.021251981282

# 0.00377009533942

# 0.00523667363672

# 0.0078281454093

# 0.0177677502829

# 0.0175827783602

# 0.0143901712337

# 0.0317569070515

# 0.0183468294599

# 0.00702146758676

Google Sites

Report abuse