Before getting started with the lab a number of libraries must be downloaded. A list of the files downloaded are shown below:
SciPY - Open source software for mathematics, science, and engineering
Matplotlib - Comprehensive 2-D plotting
Pandas - Data structures and analysis
Libopenblas-dev - Linear Algebra Software
libatlas-base-dev - Linear Algebra Software
NumPy - Base N-dimensional array package written in C
SciKit- Machine learning library
TensorFlow - AI library
Keras - High level API used to build and train models which includes support for TensorFlow-specific functionality
$ sudo apt update
$ sudo apt install python3-scipy
$ sudo apt install python3-matplotlib
$ sudo apt install python3-pandas
$ sudo apt install libopenblas-dev
$ sudo apt install libatlas-base-dev
$ sudo pip3 install -U numpy
$ sudo pip3 install --only-binary :all: -U scikit-learn
$ sudo pip3 install -U tensorflow
$ sudo pip3 install -U keras
Final package downloaded, sweet.
After downloading all the packages the next step was to enable X11 forwarding on my laptop to run code on Raspberry Pi without using the VNC Viewer tool from one of the previous labs.
X11 forwarding is a mechanism that allows a user to start up remote applications but forward the application display to your local Windows machine.
To test X11 forwarding, I ran 'pyplot_simple.py' which is a simple python script that generates a plot using matplotlib. The code is shown below along with the results.
import matplotlib.pyplot as plt
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
plt.show()
A drawback to using X11 forwarding is the latency between my laptop and the Raspberry Pi and it does not let me run multiple python code that needs support for more than one GUI or plot.
The following programs were run which displayed histograms, box plots, and other features provided by NumPy, matplotlib, and other libraries.
$ python3 scatter_demo.py
$ python3 histogram_demo_features.py
$ python3 pyplot_text.py
$ python3 histogram_demo_extended.py
$ python3 boxplot_demo.py
$ python3 linreg.py
$ python3 interpolation.py
$ python3 plot_lda.py
$ python3 plot_lda_qda.py
$ python3 plt_final.py
Here is a python linear and cubic interpolation example. Minimal code was written thanks to numpy, scripy, and matplotlib libraries.
Cross Validation is a technique which involves reserving a sample of a data set which will NOT be used to train the model. The model will be tested on this sample before finalizing it.
The steps involved in cross validation are:
You reserve a sample data set
Train the model using the remaining part of the data set
Use the reserve sample of the test (validation) set. This will help in gauging the effectiveness of the model's performance. If the model delivers a positive result on validation data, it is acceptable to move forward with current model.
In this example I will be exploring the correlation between CPU usage % and temperature in C of a raspberry Pi and use different methods to correlate the data.
In the example above I ran 'plt_cv2.py' using the data from the CSV file. The python script plots the raw data and a linear regression line on the same plot.
The model uses a cross-validation generator to split a data set into a sequence of train and test portions.
In this part of the lab I created my first deep learning neural network model using Keras. Keras is an open source Python library for developing and evaulating deep learning mdoels.
Keras integrates the efficient numerical computation libraries of Theano and TensorFlow.
The steps taken to create a deep learning are:
Load Data
Define Keras Model
Compile Keras Model
Fit Keras Model
Evaluate Keras Model
Tie It All Together (Run Python file: keras_diabetes.py)
Make Predictions
Deep learning model requirements:
Python 2 or 3 installed and configured
SciPy & NumPy installed and configured
Keras and a backend (Theano or TensorFlow) installed or configured
In the future I may setup the environment with Anaconda rather on my laptop rather than use the Pi if possible. I'll probably keep using git bash to keep the linux commands or scrap my windows OS altogether.
In this model I used the Pima Indians onset of diabetes data set which is a standard machine learning data set from the UCI Machine Learning repository.
For more information on the techniques used click here.
After loading the data, defining the Keras model and compiling it, the model is at a point where it is ready for computation. The model was trained over epochs and each epoch is split into batches.
The batch size is a number of samples processed before the model is updated.
The number of epochs is the number of complete passes through the training dataset.
In csv file used in this model contains 768 rows of data.
In this training process, there will be:
150 epochs
Batch size of 10 (10 rows of data is processed before model is updated)
i. In theory, there will be about 75-77 batches per epoch.
ii. 150 epoch * 77 batch / epoch * 10 samples / batch = 115,000 rows of data
iii. Traditionally, the number of epochs is large, often in the hundreds of thousands
In this model we see the accuracy start off at 58.20% until it reached an accuracy of 78.26%.
Based on the 8 inputs:
Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
The model can predict with a 78.26% accuracy that the patient will or will not test positive for diabetes.
Class Value 0 is interpreted as "tested negative for diabetes"
Class Value 1 is interpreted as "tested positive for diabetes"
keras_diabetes.py displayed a matrix of zeros and ones in the terminal window indicating if a person will develop diabetes or not depending on the input.
It is worth noting the accuracy in this screenshot is 79.04% and not 78.26% as mentioned before. Neural networks are a stochastic algorithm, meaning that the same algorithm on the same data can train a different model with different skill each time the code is run. Randomness should be embraced in machine learning, not everything is deterministic.