CS766 PROJECT - DISCUSSION

DISCUSSION

What We Learn

Through this project, we encountered several problems we did not expect. We underestimated the computational workload for computer since we did not have access to a GPU cluster or a server for computation. The time and space complexity for training the neural network is long which are the restrictions of the experiments device. In addition, parameter adjustment process for the network is very time consuming.

Therefore, we tried several approaches to solve these problems:

Images are resized to smaller scale in order to increase the number of input data. However, after we reduced image size to 227 by 227, the RPN cannot find any bounding box information to following training. Since it causes a lot information loss, we did not continue with this approach.
Since we are not able to use all training samples in one training due to memory limit, we end up with doing multiple small size training. We save the network trained with first 300 images, save it, and then use it to continue training for the next 300, and than save it and use it for another 300 training images. Repeat the process until we use all the training images. We came up the idea ourselves, but it turned out to be similar to the idea of transfer learning. The only problem is we do not know if the network can maintain the old information after several new trainings. We the apply the transfer learning with pretrained Alexnet and compare the results. Again, we do not know how much information is retained through the new training. Although Alexnet showed better accuracy than our model, we suspect it happen by chance as training size is small. We need do more experience to test which model performs better.
Based on the variance that vehicle has, we need training it with a lot more data to avoid overfitting problem. Early-stopping and drop out are used to reduce over fitting problems.

Future Work

In the future, there are several area we could improve for the experiment:

We can find a good conventional computer vision technique to segment the image but keep the important information, so we can use lower resolution / less information as input to speed up the training.
Try less complicated CNN architecture for two reasons:

Previous: Result Next: Material