Day 20

Today:

    • Machine learning using scikit-learn
    • Object recognition
    • Project ideation and team formation

Todo before we get started:

There are a couple of steps in this process that take a while to run, it will help the flow of class if you get these out of the way while I am showing you stuff on the projector. Please run these commands:

$ cd ~/comprobo2014
$ git pull upstream master
$ cd exercises/object_recognition
$ ./download_caltech_101.sh
$ python get_features.py

Machine learning using scikit-learn

I ask that you follow along with this on the projector (I will not have time to help trouble shoot these install steps if they fail for you during class, I can help after class though if you are interested).

To setup ipython notebook so you can run the exercises run:

$ sudo pip install --upgrade ipython[all]
$ sudo pip install --upgrade tornado
$ sudo pip install --upgrade pyzmq

The relevant ipython notebooks can be downloaded using:

git clone https://github.com/paulruvolo/DataScienceMaterials.git

And the notebooks can be run using (e.g. this is how to run the 1st notebook):

cd DataScienceMaterials/machine_learning_lecture_1
$ ipython notebook Machine\ Learning\ Lecture\ 1.ipynb

Experimenting with Caltech 101

Caltech101 is a commonly used (although now somewhat obsolete, see Caltech256 and Cifar for more modern databases) image recognition database. The database contains images of 101 different object categories. The complete description of the database can be found here. The images tend to be in stereotypical pose and centered in the image. The average image of each of the object classes looks like this:

Today, we will experiment with learning an object detector that will take an image as input and predict which object is present in the image. To get started we will download the database by executing the following commands:

$ cd ~/comprobo2014/exercises/object_recognition/
$ ./download_caltech_101.sh

Model 1:

To get started, we are going to train a very simple model using the color of the image. We will do this by choosing only object categories with at least 50 images. Next, we will read in each image, average the pixels to obtain a 3-d vector, and then use the python library scikit-learn to learn a model to recognize the objects. The program will print out 5 numbers which each represent the accuracy of a particular partition of the inputs into training and test data. Additionally, the script will show the per-class average accuracy.

$ python learn_color_model.py

How did the model do? Any surprises? You may be able to play around with the specific classifier used to and parameter settings to improve the model, but it's really not worth it.

Model 2:

To make things a bit more interesting, we are going to characterize the appearance of objects using SIFT descriptors. To refamiliarize with SIFT descriptors you might want to revisit the following example:

$ cd ~/comprobo2014/exercises/keypoints_and_descriptors
$ python visualize_sift.py

Our algorithm for feature extraction will be to:

    1. Find keypoints in each image
    2. Extract SIFT descriptors (or another descriptor if you prefer)
    3. Average all of the SIFT descriptors found in the image to obtain a 128 dimensional vector to represent each image.

This procedure gives a general picture of the types of pixel gradients we see in the image. For instance, this is the average of all SIFT descriptors for an image of a lamp.

This is the average of all SIFT descriptors for an image of a hedgehog:

Even though all keypoints are "corners", it appears the hedgehog keypoints on average have gradient vectors pointing in multiple directions rather than the lamp images that are primarily pointing along one direction (e.g. on an edge).

The first step will be to extract our SIFT descriptors. I have divided this into multiple steps since it takes a long time to run and you will probably want to try learning multiple models on top of these features. To extract features and save them as a pickle file, run:

$ cd ~/comprobo2014/exercises/object_recognition/
$ python get_features.py

Once you have finished extracting SIFT descriptors, you can learn a model by average the SIFT descriptors and training a model using logistic regression.

$ python learn_simple_model.py

As before the output will be both the performance on each random partition of data into training and test as well as the accuracy on each object category. Are you surprised by any of the results? What might this model be missing? What happens if you try a different model like the Support Vector Classifier or Random Forest?

Model 3:

For our next model, we make the observation that taking the average of the SIFT descriptors misses a lot of information. For instance, are all the SIFT descriptors near the mean or is the distribution of descriptors bimodal? Our next model will allow us to incorporate this information. Our feature extraction pipeline will change to:

    1. Find keypoints in each image
    2. Extract SIFT descriptors (or another descriptor if you prefer)
    3. Create a dictionary of prototypical SIFT descriptors by performing a clustering over SIFT descriptors from ALL object categories
    4. For each image, compute the proportion of its SIFT descriptors that are closest to each element of the dictionary

This model is known as a visual bag of words model. Visual words are the dictionary elements learned using the k-means clustering. The model is known as a "bag of words" because the frequency of occurrence of the words is all we use to characterize the image (not the location of the visual words in the image).

A crucial parameter of this system will be the size of our descriptor dictionary. As a starting point, I have set the dictionary size to a very small value of 5. After you see how well the system does with this dictionary size, you should try a larger size (such as 200). To learn a model using these SIFT descriptors.

$ python learn_bow_model.py

How good is the performance? How sensitive is the fit to changing various parameters of the model (dictionary size, C value for the LogisticRegression model)?

If you want to experiment with non-linear classifiers and parameter tuning, you may also want to play with the python script learn_bow_model_nonlinear.py.

Additional Object Recongition Resources:

Vlfeat is an advanced feature extraction and matching library (it has a lot of stuff that is not available in OpenCV). Natively it is a C++ library, but luckily for us there is a (experimental) Python interface!

To download and install the Python interface execute the following commands:

$ cd ~/comprobo2014/exercises/object_recognition/pyvlfeat-0.1.1a3
$ python setup.py build
$ sudo python setup.py install

Then, to learn a detector run this command:

$ cd ~/comprobo2014/exercises/object_recognition/phow_caltech101
$ python phow_caltech101.py

I will leave it to you to learn about how the system works by examining the code / README.md file. When I ran the code I got an accuracy of 65%.

Interesting Papers

Project Ideation and Team Formation

We are going to do ideation at tables and then affinity grouping on the whiteboard. You will be using post-it notes to jot down ideas. Ideas for projects could be centered around:

    • Solving a particular problem
    • Serving a particular user group
    • Investigating a particular algorithm (or class of algorithms)