In this lab, we will begin learning about Machine Learning through Computer Vision. Learn how to work with the Google Cloud Vision API to detect objects, text, and emotions.
Begin by reading the Detecting faces documentation. You should be able to notice the lines of code that are relevant or different
Read the section on Optical Character Recognition (OCR).
Create two new files in the same directory as the previous project.
Add the corresponding code for each one so that:
An image showing handwritten text is sent to the endpoint and returns the recognized text as a string.
An image featuring a human face with an expression (joy, anger, etc.) returns the likely emotion.
Your programs only need to log the result.
Computer Vision is the field of computing that teaches computers to make sense of what they “see” through digital camera inputs. Computer Vision applications process and analyze visual data (videos or images) to make predictions or inferences about what it is “seeing”. It attempts to recreate how human vision works using neural networks to make predictions.
When we look at a set of items and have to identify a specific item, we rely on our memories and understanding of the objects to select the correct one. In this example, we can identify the taco based on our experiences eat or seeing a taco in our past. Most of us can look at that taco icon and tell with a high degree of confidence that it is a taco, even it differs greatly in color, shape, etc.
Our own neural networks in our brain can look at an image and compare it to our own trained models (our memory and experience) and make a prediction about what we are seeing.
Computer Vision involves teaching computers to make the same kind of predictions based on inputs.
In order to get there, computers have to be given large amounts of inputs, pixel data in the form of color values. Based on these sets of input data, the computer can generate a confidence score about its prediction and attempt to select the correct item.