We used 4000 images as input to train the model and used another 4000 images within the dataset to test the accuracy of the model. Here are some sample output images from the model. The yellow boxes are draw based on the bounding box output by the model.
The left column are result images from out CNN model and the right column are result images from model trained using transfer learning with pretrained alexnet.
Alexnet tends to produce better bounding box around the objects and is able to detect the presence of more objects but it tends to make more false positive predictions.
VGG-16 has more convolutional layers and it takes more computation power than our computer can handle to train with reasonable sample size, so we did not include result for it.
Our Faster RCNN model
Faster RCNN use transfer learning with Alexnet
Our faster R-CNN model
Faster RCNN use transfer Learning with Alexnet
Here is the Precision Recall Curve based the two models. For our computer, it took about 2 days to test on 4000 images for one model. The average precision for our model is 0.12 and the average precision for model trained with pretrained alexnet is 0.32. From the original faster R-CNN paper, Sun group reported average precision of 0.66 of their model trained with numerous training data on GPU cluster.
Our PR curve
Alexnet PR curve