This site is no longer maintained. Please visit
The below are a list of ideas that I think are interesting. Some of them may be used for academic course projects and some may be insufficient. You can build on such ideas. If you have any queries regarding these ideas, please don't hesitate to contact me. If you do try these ideas, kindly update me, and I'll update the findings here (with due credits, of course). If there is already an existing work on the ideas listed below, kindly update me with that as well. I'll add a link to the relevant content.
Although deep networks have become extremely powerful for object recognition, they perform poorly in the presence of adversarial attacks. In [1], the authors propose a method to train deep networks to be robust to adversarial attacks by enforcing BPFC regularization. In [2], the authors show that deep networks trained on ImageNet-1k database are biased towards local texture and hence when tested on edge maps (where no local texture is present), their efficiency reduces.
Since adversarial attacks/examples mainly modify local properties of images, it would be interesting to see if training a deep network to be robust to adversarial attacks would lead to lower bias towards local texture. One way to verify this would be to test the performance of deep networks, which are trained to be robust to adversarial attacks, on edge maps of images from ImageNet-1k database or any other similar database.
[1] Sravanti Addepalli et al. "Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes", CVPR 2020.
[2] Robert Geirhos et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness", ICLR 2019.
In [1], the authors show that deep networks trained on ImageNet-1k database are biased towards local texture than global shape. One of the experiments conducted in the paper is to get ResNet-50[2] predictions on edge map of an image. In the paper it is shown that accuracy of ResNet reduces on edge maps of images.
It would be interesting to see if ResNet features (features tapped before the global pooling operation) contain information about global shape. One way to check this is to freeze the weights of previous layers of ResNet-50 and train only the last softmax layer on edge maps of images in ImageNet-1k database. And then test this new model with edge maps of images.
[1] Robert Geirhos et al. "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness", ICLR 2019.
[2] Kaiming He et al. "Deep Residual Learning for Image Recognition", CVPR 2016.
A video generative model has been proposed in [1] which decomposes video into content and motion. Content latent vector remains same across all frames while motion latent vector is generated using a recurrent network. A decoder then generates a frame using content and motion latent vectors.
It would be interesting to check how good the decomposition is working. To check this, one can change the content latent vector mid-generation. Ideally the motion should remain the same while the person should change.
[1] Sergey Tulyakov et al. "MoCoGAN: Decomposing Motion and Content for Video Generation", CVPR 2018.
Quantitatively evaluating GAN models has been found challenging. Here, I propose a simple idea to evaluate them based on how we would evaluate linear regression. Given a trained GAN model and an image from test set, backpropagate the gradients (by keeping the generator weights fixed) to find the input which can generate an image close to the test image. The error between the generated image and the test image, averaged over all images in the test set, may be evaluated as a quantitative measure for evaluating GANs. Various error/similarity measures like MSE, SSIM, VGG MSE or VGG cosine similarity can be experimented with.
Build a face detection and recognition system, which can be applied on any new directory of images. Build a GUI such that whenever a model encounters an unknown face, it requests the user to enter the name of the person. The face recognition model should be able to dynamically learn to classify new faces. The model builds an index of the people appearing in the photos and stores it. Later, when a query for a particular person is issued, the model should retrieve all the images containing that person.
This idea has already been implemented in Google Photos. Nonetheless, we may not be willing to upload all our private photos to Google Photos. In that case, having an offline model, specifically trained on people in our friend circle, may be useful. Although this model may not be able to perform as good as Google Photos.