RESULTS

Run The Models

Our goal in this section was to test out their Pascal VOC model on our system, and then train our own Pascal model and run it against the same data set to see how the models compare.

Here are the results from their model: ctdet_pascal_dla_384

This shows the various objects the network is able to detect, and how well it did at detecting them on the test images. Overall the model did quite well, having the hardest time identifying potted plants successfully.

After this was complete we attempted to train our own version of that network using one of the scripts in the experiments directory. Our computer's GPU quickly ran out of memory though due to the training. In order to mitigate this, we increased the batch size a multitude of times, but it appears that the GTX 960 is not a powerful enough card to handle this sort of intense training. For reference, these are each of the pascal models that the authors made, including GPU number, timing, etc.

Using CenterNet

The main way to test CenterNet's capabilities was to use the following command

python demo.py ctdet --demo /path/to/image/or/folder/or/video --load_model ../models/ctdet_coco_dla_2x.pth

This performed Object Detection on whatever image or video was chosen.

Here is one of their example images being tested on.

Notice how it takes elements within the photo, and attempts to classify them, providing the network's probability estimation that it is correct. In this case it did well on the dogs and the people, but it wasn't able to correctly classify the service dog's harness. It appears to have seen text on a relatively flat surface and classified it as a book, though with only 30% certainty.

Here are a few tests from our own images.

Max on top of the Reichstag in Berlin. Notice how it also detects one of the statues in the background as a person.
Our friend Spencer's corgi. Here the network is able to successfully discern the cup and the dog even though she is behind the cup.
We wanted to try a noisier image here. This is a image from Google Images of Wall Street. The network is able to successfully identify most of the people in the image, though it confused a man's shoulder for a handbag and a lamp as a traffic light. From this farther off view though it is understandable that it would have some difficulty. This also showed some of the items it was not able to identify, such as flags and signs.
Curious about the ability for CenterNet to detect the traffic lights (though it misidentified it in the previous photo), we got a photo of a Shanghai street from Google Images. The network did a great job detecting cars, even due to the overlap between many of them, though the traffic light in the northwest quadrant of the photo was not detected. We think that this may have occurred due to training on American-style traffic lights as opposed to some from China. This potentially provides some insight into how the network learned what a traffic light looks like.
To provide some basis for this we tested on an image from an American town and found that the network had no problem identifying the lights here. Again the network provide very capable of detecting independent objects in a noisy environment.

Now lets try out the body pose estimator. This can be done by performing a slightly different command.

python demo.py multi_pose --demo /path/to/image/or/folder/or/video/or/webcam --load_model ../models/multi_pose_dla_3x.pth

Below is the same image of Max but this time CenterNet is also attempting to figure out how his body is orientated. After that we wanted to try a more complex image, with multiple targets in more difficult to model poses, and we could think of no better test than a photo of a fencing bout.

Conclusions

CenterNet is quite good at object determination, though in noisier images the network can misidentify items. CenterNet was very impressive with its pose-detection system though, as demonstrated above.

With the demanding system requirements for training our own models, CenterNet is not something that the average person will be able to train without levying a server from Amazon Web Services, Microsoft Azure, or the like. We were able to test the pascal model though, and it performed quite admirably.

This project taught us a lot about the process of setting up and utilizing a machine learning environment. With so many package dependencies and picky incompatibilities, we learned the hard way that every step must be taken with care when setting up your environment. While difficult, this was rewarding though and helped us learn a lot about using the Anaconda tool kit, integrating a GPU, and how to test and manage very large data sets.