Day 1: Data Visualization

Overview

During day 1 we explored concepts such as loading data, labeling, reshaping, training & testing sets, public & private sets, and plotting. All of these tools help us in the process of building facial recognition program.

Original Data Set

The data set consists of gray scale images of faces with differing emotions. Each face is categorized in one of the seven facial expressions (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

This dataset was prepared by Pierre-Luc Carrier and Aaron Courville on Kaggle and can be found here.

Bias in Data

It is crucial that data reflects the entire population. In our case, the training data should include culture and gender equally. Having non-biased data leads to more accurate results.

In the data set we used, there was little variety in culture (predominantly white).

Data Set Visualized

Essentially, the data was bunch of numbers from 0-7 and here we plotted the data onto a graph to help us visualize the distribution of emotions.

Loading Data & Reshaping

First we loaded the fer2013 data set from kaggel for further reshape, rescale, and manipulation.

We use the reshape() function when we want to give a new shape to an array without changing any of its data.

Sets

Training sets are usually bigger and used to train the model. While testing sets are usually smaller and used to test the training model.

Public testing images are not similar to those in your training set and private testing are images that are more similar.

Plotting our data

This is where we plot our data and create the bar graph you saw above. We also used plotting to see all the images in our data.