Course Code : 23AID301 Computer Vision (2- 0- 2-3) S5 B.Tech. AS &AI Faculty Information : Dr. Don S
This showcase presents the final course projects developed in the Computer Vision module, demonstrating the application of AI to understand, describe, and interpret visual information. Students explored core techniques such as image enhancement, feature extraction, segmentation, and audio-visual processing. Building on these fundamentals, the projects progressed into advanced tasks, including scene detection for understanding context, scene captioning and caption generation for producing natural language descriptions of visual content, and event proposal for identifying and localizing key actions within videos. Together, these works highlight how modern AI systems transform raw visual data into meaningful stories and insights.
Using AI Tools for creating Video Presentations: Why and How
Student groups use AI tools to create video presentations because these tools simplify content creation, automatically generate visuals aligned with computer vision concepts, and save time while ensuring accuracy and engagement. They typically use AI platforms by inputting key topics or scripts, which the tools then transform into structured visuals, animations, and narrated sequences for a cohesive presentation.
Weekly Activity
Expected Output
1 Project Orientation + Topic Finalization
Final project topic selected per group. Supervisor assigns reading material.
2 Literature Survey & Tool Familiarization
Submit 2-page literature survey. Explore tools/libraries (OpenCV, PyTorch, etc.)
3 Dataset Selection & Preprocessing
Download dataset, annotate/clean data, document the structure
4 Model/Method Exploration
Implement baseline model/approach from literature (even if small-scale)
5 Basic Functionality Implementation
Each group shows progress: feature extraction, scene labeler, etc.
6 Midway Review 1 (Mini Presentation) 3 marks
Present architecture, initial results, difficulties
7 Component Tuning & Evaluation
Try tuning hyper parameters, test on validation set
8 Advanced Model Integration / Optimization
Integrate improved methods (e.g., attention in captioning, ResNet for scenes)
9 Result Analysis & Visualization
Show result samples, evaluation metrics, errors
10 Midway Review 2 (Demo Focus)
Working demo of partial pipeline (e.g., event proposal from a 30s video)
11 Refinement & Documentation
Refactor code, create diagrams, update report
12 Final Demo Preparation 5 marks
Finalize slides, GitHub repo, test demo end-to-end
13 Project Presentation & Viva 2 marks
10-min group presentation + Viva + Report submission
Presentation Topics
Paper 1 : Driver Assistance Scene Context System: Real-Time Multi-Label Scene Understanding for Vehicles
Paper 2 :Image Captioning using Xception CNN and Recurrent Neural Networks (RNN)
Paper 3 : Automatic Image Caption Generator using Deep Learning
Paper 4 : AVI-Cap: Attributable, Verifiable, and Interactive Caption Generation
Paper 5 : Coral Bleaching Classification using Deep Learning and Computer Vision
Paper 6 : Transformer Based Spatiotemporal Modeling for Shoplifting Activity Recognition
Paper 7 Snap & Cook: Image-to-Recipe Generation
Paper 8 : Enhancement of Lung CT Images for Improved Diagnostic Clarity
Paper 9: VisionAid – Image Captioning System for Visually Impaired
Paper 10 : IMAGE CAPTIONING USING DEEP LEARNIN
Paper 11 :Vision-Based Drone Navigation Using Computer Vision
Paper 12 :Real-Time Interactive Facial Exercise Coach
Paper 13 :Self-Driving Car: Obstacle Detection and Navigation Using Computer Vision
Paper 14 : Polygon Zone based Object Detection
Paper 15 :Image Caption Generation using an Attentive CNN-LSTM Architecture on the Flickr8k Dataset
Paper 16 : Explainable Image Captioning System (EICS)