Software Automation
Scikit - Logistic Regression

Objective

By the end of this lab, you will be able to:

Construct datasets with categorical features
Visualise binary classification data
Use scikit-learn to train a logistic regression model
Predict outcomes for new samples
Evaluate how predictions relate to the original data

Part 1: Setup and Imports

Step 1: Install scikit-learn

Run this only once in your environment.

pip install scikit-learn

Step 2: Import Libraries

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression

Part 2: Define and Visualise Binary Data

We will use the cat data set again. It can be tricky to visualise this type of data. Since there are 3 features, we can make a 3D scatter plot:

Part 3: Training a Logistic Regression Model

model = LogisticRegression()

model.fit(X, y)

Part 4: Making Predictions

Use your model to predict a value for a new x.

# New animal: 4 legs, whiskers, claws

X_new = np.array([[1, 1, 1]])

y_pred = model.predict(X_new)

y_prob = model.predict_proba(X_new)

print(f"Prediction: {'Cat' if y_pred[0] == 1 else 'Not a Cat'}")

print(f"Probability of being a cat: {y_prob[0][1]:.2f}")

Logistic regression doesn’t say "definitely cat" or "definitely not cat". Instead, it calculates a probability based on the linear combination of features and applies the sigmoid function. In the data above there are only two positive examples. How could we improve the accuracy?

Part 5 - Lab Exercises

Modify the Input

Change the new input to different combinations of features and see how the predictions change.

Examples to try:

[0, 1, 1]

[1, 0, 1]

[1, 1, 0]

Add New Training Data
- Add more samples to your X and y arrays.
- Try to build a more balanced dataset and observe if predictions improve.
Predict Multiple Samples
- X_test = np.array([

[1, 0, 0],

[0, 1, 1],

[1, 1, 1],

[1, 1, 0],

])

y_test_pred = model.predict(X_test)

print("Predictions:", y_test_pred)

- Which ones are predicted as cats?
- Do these predictions make sense based on the training data?
Try some of your own data sets
- E.g. Is this lunch healthy?
  1. Contains fruit
  2. Contains vegetables
  3. Contains sugary drink
  4. Contains processed snack
- Choose a topic you're interested in (sports, food, school, pets, etc.)
- Brainstorm 3–5 relevant yes/no questions (these become features)

Page updated

Report abuse

Software AutomationScikit - Logistic Regression