Classifying Ship Classes

8/31/19

Using computer vision to determine which class an RCCL ship belongs to.

Code available here

Inspired by fast.ai

Introduction

Computer Vision

Computer vision is a sub-field of Artificial Intelligence (AI) which involves processing and analyzing images or videos [1]. On most mobile devices nowadays when you open the camera and point it at a person you will notice a box appear around the subject's face. How did the phone recognize that there is a face in the picture in real time?

There is a long history of research groups developing AI algorithms to determine various objects (not just faces) in images. It all starts with good labeled data. If you unravel the enigma on how your cell phone recognizes faces you will most likely start with ImageNet. ImageNet is a huge dataset which offers more than 14 million images which are annotated stating what is in the image [2]. It also provides more than one million images with bounding boxes around objects (like the box you see around your face in your cell phone camera). Researchers fit models to the data with the goal that the model will tell you what is in the image and it's bounding box (location).

Convolution Neural Network

The model of choice for image recognition is often a convolution neural network (CNN). A neural network gets it's name from biological neural networks that constitute brains. These systems are able to learn on their own by repeating tasks [3]. A convolution is a mathematical operation on two functions to produce a third function that expresses how the shape of one is modified by the other [4]. For the technical specs behind CNNs I recommend reading Stanford's lectures notes. However, I will provide an overview of CNNs here inspired by Zeiler and Furgus (2013). A digital image is represented by red, green and blue pixel values of 0-255. The raw data size of an image is therefore 3 x height x width. The convolution is undertaken by applying a filter to the image. The filter is chosen so that features emerge from the image. In the first layer (a neural network is often made up of lots of layers) the algorithm can detect basic features such as lines or color gradients. In the second layer the algorithm can detect more complex features such as corners or spherical patterns. By the time you get to layer five the algorithm can start to identify things such as faces or flowers.

Figure 2 in Zeiler and Furgus (2013). This image gives a few examples of what features are detected in each layer (5 layers). The grey images are what regions are activated in the image and the corresponding image in the same layer shows what the feature is 'seeing'. For example, the bottom right grey images in layer two show activation patterns of two lines. Looking at the image representation of the two lines you can see it is detecting door corners or the corners of windows.

Creating an image dataset

Royal Caribbean Cruise Line Ship Classes

I will focus on the most popular ship classes for the brands of Royal Caribbean and Celebrity Cruises.

A ship class is a group of ships of similar design. The name of a ship class is most commonly the name of the lead ship, the first ship commissioned or built of its design. For example, Celebrity Cruises are currently launching the Edge Class with the second ship: Apex, sailing out of Southampton in April 2020. The will be three more Edge class ships added to Celebrity's fleet by 2024 [5]. The classes are I will focus on are:

Royal Caribbean ship classes:

Celebrity Cruises ship classes:

Below are two ship classes: Quantum class on the left and Edge class on the right

Quantum Of The Seas (Quantum Class)

Celebrity Edge (Edge Class)

Downloading images from the web

To make life easier with undertaking machine learning on the images it is important to spend time ensuring the images are in a format that is easily digestible by the model.

I will use fast.ai which uses wrappers to PyTorch. PyTorch is Facebook's open source deep learning library and fast.ai provides a simple to use API for computer vision.

  1. Create folders in the work directory:
from fastai.vision import *
from fastai.widgets import *
import os

path = Path('data/ships')
classes = ['quantum', 'oasis', 'freedom', 'radiance', 'edge', 'solstice', 'millennium']
for c in classes:
    folder = c
    dest = path/folder
    dest.mkdir(parents=True, exist_ok=True)

2. Create a list of URLs for the images:

  • Go to google images and type the name of the ship(s) associated with the ship class e.g. https://www.google.com/search?q=quantum+of+the+seas
  • Open a console by pressing Ctrl + Shift + J. Type in the following Java code below and press Enter. This will create a list of URL for the images:
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
  • Move the text files to the folder corresponding with the ship class then rename them:
for c in classes:
    os.rename('data/ships/' + c + '/download', 'data/ships/' + c + '/urls_' + c + '.csv')

3. Download the images

I'm opting to download 50 images for each class.

One advantage of fast.ai is that you can download pre-trained models so you can classify objects with only a few pictures for each class.

for c in classes:
    file = 'urls_' + c + '.csv'
    folder = c
    dest = path/folder
    download_images(path/file, dest, max_pics=50)

4. Remove any images that can't be opened

for c in classes:
    verify_images(path/c, delete=True, max_size=500)

5. View the images

Read in the data using ImageDataBunch.from_folder

data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, bs=16).normalize(imagenet_stats)
data.classes
data.show_batch(rows=3, figsize=(7,8))

Train the model

fastai provides a function called cnn_learner which you only need to input three arguments: the data, the model architecture and an error metric to optimize.

I'm going to use the ResNet50 architecture. This is a state-of-the-art CNN which uses residual or skipped connections. This model was developed by Microsoft's research team: He et al (2015). Its success can be seen in the University of Standford's DAWNBench which measures the time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet [6].

learn = cnn_learner(data, models.resnet50, metrics=error_rate)

I will use fit_one_cycle on the learn object with 4 epochs (training cycles). What this does is it trains the model on the data images and updates the weights associated with the last few layers of the neural network. Kostas Mavropalias has a good blog post explaining the fit_one_cycle method [7]

learn.fit_one_cycle(4)

Out of the box the model has an accuracy rate (error_rate) of 50%. Not bad. As there are seven classes if we were purely guessing the model would have an accuracy rate of 15%.

Improving the model

We will start with pruning the data as some images in our dataset may not be representative of the ship class. We can use fastai's Widgets which will show images with the top losses (largest error rates) and allow us to delete them. First we have to read in the data using ImageList

db =(ImageList.from_folder(path).split_none().label_from_folder().transform(get_transforms(), size=224).databunch())
learn_cln = cnn_learner(db, models.resnet50, metrics=error_rate)
ds, idxs = DatasetFormatter().from_toplosses(learn_cln)
ImageCleaner(ds, idxs, path)

This creates a cleaned.csv file which we will now read in.

db = (ImageList.from_csv(path, 'cleaned.csv', folder='.').split_by_rand_pct(0.2).label_from_df().transform(get_transforms(), size=224).databunch(bs=16).normalize(imagenet_stats))
learn = cnn_learner(db, models.resnet50, metrics=error_rate)
learn.fit_one_cycle(4)

The accuracy slightly improved but there are addition ways to improve the model.

You can specify the learning rate parameter (how quickly to update the model after each training cycle)

learn.lr_find()
learn.recorder.plot() 

You want to specify a learning rate associated where the gradient of loss is steepest:

learn.fit_one_cycle(4, max_lr=slice(1e-4,1e-2)) 

The accuracy of the model has improved to 61% (error_rate of 0.39).

The last parameter to adjust is the number of epochs. You don't want to specify too many epochs otherwise is will cause the model to overfit (that is do well on the train dataset but has limited performance on the validation dataset).

Using a value of 6, the accuracy improved to 66%.

Interpreting the model

fastai provides tools for visualizing the accuracy of the model

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_top_losses(9)

The plot below shows the nine images which have the largest loss. The text above each panel shows prediction/actual/loss/probability. It is encouraging that the probability values are low for all predictions. The image in the top right can be explained by the effects of shadows. It's a picture of Oasis but because of the high sun there is a shadow cast on to the top of the hull. This shadow has a blue hue and the algorithm understandably thought this was a Celebrity ship (Millennium) which often have a blue lick of paint on the hull as it is our brands color.

Another useful metric is the confusion matrix which shows the counts of image classification between ship classes

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

It is important to focus off the diagonal as these are incorrectly classified images. The confusion between quantum, radiance and freedom is understandable based on the similar color the royal ships are.