Project

  • Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition and Face Reconstruction (Duration: 4 Month)

Abstract: Facial expression recognition (FER) has been a common problem in the area of computer vision. This has applications in many different areas ranging from advertising, augmented reality, human computer interaction and human response analysis to name a few. This problem has similarities to the action recognition problem, however unlike the actions here problem are very subtle and fine-grained, hence a different approach has to be tried. Here we tried a different approaches of transfer learning using different loss functions, fine-tuning techniques for a generalized performance on expression, age and gender recognition. Another objective is to try the obtained deep embedding of the face for the task of image reconstruction/ inpainting.

  • Spatial distance dependent Chinese restaurant processes for image segmentation (Duration: 4 Month)

Abstract: The distance dependent Chinese restaurant process(ddCRP) was recently introduced to accommodate random partitions of non-exchangeable data The ddCRP clusters data in a biased way: each data point is more likely to be clustered with other data that are near it in an external sense. This project examines the ddCRP in a spatial setting with the goal of natural image segmentation. Here we explore the biases of the spatial ddCRP model better suited for producing human-like segmentations.

  • Content Based Visual Question Answer (Duration: 4 Month)

Abstract: Given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Mirroring real-world scenarios, such as helping the visually impaired, both the questions and answers are content based. As a result, a system that succeeds at VQA typically needs a more detailed understanding of the image and complex reasoning than a system producing generic image captions. Here we concentrate only on the content based VQA we are not considering the any logical or reasoning question.

  • Pedestrian Detection using Aggregate Channel Feature(ACF) (Duration: 12 Month)

Abstract: Pedestrian detection is a key problem in computer vision, with several applications that have the potential to positively impact the quality of life. In recent years, the number of approaches to detecting pedestrians in monocular images and video has grown steadily. Aggregate Channel Feature(ACF) with the Ada-boosting is one of the most popular technique to detect the pedestrian with high accuracy. The Reference can be find here.

  • Gesture based Robot Motion (Duration: 12 Month)

Abstract: Vision based Human Computer Interaction(HCI) was always a challenging task for the researcher. In this project, I am developing a mechanism to control the robot based on the given Gesture. The objective of the project is Segment the hand Gesture from the Complex Image and Classify the segmented hand gesture to one of the five predefined class(Left move, Right move, Forward move, Backward move and stop). Then after sending the Classification result to the Robot via a wireless media, Robot receives the signal and perform move according to received signal.