2. Develop a Solution

Evaluate Model

Training the Machine Learning Model

Now that we have collected our dataset, it's time to train our model.

In this tool, all we need to do is go to the 'Train Model' tab and click on the 'Train Model' button. You will see the model's progress on the page. The graph on the left shows the accuracy, it should converge to 1, and the graph on the left shows the loss or the amount of error, it should converge to 0.

Side note: Accuracy and loss graphs can give us quite a lot of information about the quality of the training process and how appropriate the model architecture is for the dataset we collected.

We can also evaluate our model by looking at how well it performed on the training dataset. A confusion matrix is a plot that compares the ground truth labels to what the model predicts. The confusion matrix seen here suggests that the model learned to distinguish many of the classes well, but struggled with the "still" class.

In the context of the problem we are trying to solve, this gives us two important pieces of information about our model. First, that there are major accuracy issues with an important class. Stakeholders value fairly accurate predictions, and a model that incorrectly classifies "stillness" as "taps" half the time would be very frustrating.

Second, there may be issues with using this model with a diverse group. A common problem with machine learning models is overfitting. This is when a model learns its training dataset too well and cannot generalize to new data points. We should make sure our model is not overfitting by testing the model on data that is different from what we used to train our model. Importantly, we should make sure the model works equally well for different subgroups of our stakeholders.

As we continue to test our model, confusion matrices and plots of the datasets will help us fully understand the performance of the model. Here, it is important to understand how well the model performs on standard input and on edge cases. Particularly with the edge cases, we should look at the performance of the model on subgroups and individuals at the intersections of subgroups.

Case study: The importance of testing intersectional subgroups

Facial classification and recognition algorithms are widely used by everything from social apps to national security. However, in 2018 Joy Buolamwini demonstrated that the performance of these algorithms is unequal across racial and gender lines. These algorithms were found to overlook many genders, to misgender many people, and to completely miss some faces all together - particularly the faces of women with deep skin tones. This work encouraged researchers to take more care in testing the performance of their models, especially for historically oppressed, minority, and intersectional groups.

Putting it all together

When you feel satisfied with the performance of your model, take time to document your choices and its performance. This will be used to create a model card that can be packaged with your model JSON file when you export it.

Next: Measure impact >>

Page updated

Google Sites

Report abuse