Detecting Diabetes - Binary Classification

Let's start with a simple example on how to use Neurons: a Binary Classification problem with a set of medical data from many patients. If you're not familiar with Keras or Machine Learning, first take a look at Machine Learning - The Basics.

What is Binary Classification?

Binary Classification is the task of dividing the elements of a certain set into two groups, based on certain criteria. In our case, we'll divide our patients in two groups: the ones who have diabetes, and the ones who don't.

Dataset Structure

This is the dataset we will use: Diabetes Dataset (CSV). You can learn more about it here. This dataset is already included in the "Data" folder in Neurons, as a .txt file: diabetes_data.txt.

It contains medical data from 768 patients with 8 different attributes, which our model will take as input. In order:

0. Number of times pregnant
1. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
2. Diastolic blood pressure (mm Hg)
3. Triceps skin fold thickness (mm)
4. 2-Hour serum insulin (mu U/ml)
5. Body mass index (weight in kg/(height in m)^2)
6. Diabetes pedigree function
7. Age (years)
8. Class (has diabetes or not)

The last number of each line is 0 or 1, indicating if the patient has diabetes or not, respectively. This is going to be our output.

Creating the Model

First of all, we need to create a model that can be trained with our data.

Open Neurons. The file 'diabetes_data.txt' should already be selected as the 'Train File'. Our input is the data from column 0-7 (the attributes), and the output is in column 8 (patient has diabetes or not).

Change input and output columns.

Let's create our model. Click "File" -> "New Model" or just press "Ctrl + N" to create a new model. We'll name ours "Diabetes_Model". Click "Create Model".

Note: the number of input and output columns can't be changed after creating the model. To change input and output shapes, create a different model.

Creating the model.

Congratulations! You've just created your first Machine Learning model! Let's change it a bit.

Our model consists of:

Input Layer: the layer that first receives our inputs. Note: this layer also includes a hidden layer called 'input', with 8 neurons and 'ReLu' activation.
Hidden Layers: these are the middle layers. Add or delete as many as you want.
Output Layer: the layer that outputs our results.

You can double click any layer, including Input and Output, to modify it. Right now, all our layers are using 'ReLu' Activation Functions, which will work fine for most simple applications.

However, the Activation Function of the Output Layer is of special importance. Since our true outputs are either 0 or 1, we want the outputs of our model to also be in that range, so we need to choose the 'Sigmoid' function.

For that, double click 'Output', select 'Sigmoid' and click 'Save Edit'.

Change Output Activation Function to Sigmoid.

We are now all set to start training our model with the provided dataset.

Training the Model

To train the model with the selected 'Train File', press the 'Train' button.

There are a lot of options here. To know what they all mean, go to the Glossary. Let's go through them:

Optimizer Algorithm: leave this at 'Adam', it's good for most models.
Loss function: this will tell our model how badly (or well) it's performing. Since this is a Binary Classification problem use 'Binary Crossentropy'.
Show metric: the metric we want to see once the training is over. For this, let's use 'Binary Accuracy'.
Epochs: 100. Batch size: 10. This will do for us.
Show: select all 3 options to see what they do.

After all that, the window should look like the image below. Click 'Train'.

Training options.

After a couple seconds, the model will complete training. This should be the output:

Training output.

Model Loss and Model Binary Accuracy plots.

As we can see in the first picture, our final Binary Accuracy is about 76.3%, which means that our model is correctly predicting if a patient has diabetes or not about 76.3% of the time, for this dataset. Notice that our outputs are in decimal form (e.g. 0.6234372) so, in the context of this dataset, we have to round it to 1 to get the correct output.

In the second figure, we see the plots of the Model Loss and the Model Binary Accuracy, in each epoch. Remember: an epoch is a cycle of the model through the data set, so this model trained with 100 cycles. Here's what we can conclude:

Model Loss: this is the value of the "Binary Crossentropy" function in each epoch. It drops very rapidly and then stabilizes in a low value, which is what we want, since this is a measure of how far away the model's outputs are from the true outputs.
Model Binary Accuracy: this is the accuracy of the model in each epoch. It steadily grows with each epoch, implying that the model is training successfully and getting better every epoch.

Note: the plots were made in Matplotlib, therefore, by clicking the 'Configure Subplots' button in the top menu, you can change the characteristics of the plot.

Our model is trained and ready to go! Let's save it so we can use it later.

Saving the Model

To be able to use the model again in Neurons, or to be able to load it in a custom program, we can save the model to a '.h5' file.

This saves the model's:

Weight values
Architecture
Training configuration
The optimizer and its state, if any (this enables you to restart training where you left off)

To save the model:

Click 'File' -> ''Save Model' or just press 'Ctrl + S'. Choose the directory you want the model to be saved at and press 'Save'. Done.

Final Note

We've just created a simple Sequential model with only 2 hidden layers (the input layer already includes a hidden layer), that successfully detects diabetes based on 7 attributes about 76% of the time. To get a better estimate of its performance, we could have split the dataset into training and testing, so we could see how accurate it is on new data. This is a common practice in Machine Learning.

Page updated

Google Sites

Report abuse