Machines That Understand Video – 1st Computer Vision Virtual Showcase (CVVS -25)

Course Code : 23AID301 Computer Vision (2- 0- 2-3) S5 B.Tech. AS &AI Faculty Information : Dr. Don S

This showcase presents the final course projects developed in the Computer Vision module, demonstrating the application of AI to understand, describe, and interpret visual information. Students explored core techniques such as image enhancement, feature extraction, segmentation, and audio-visual processing. Building on these fundamentals, the projects progressed into advanced tasks, including scene detection for understanding context, scene captioning and caption generation for producing natural language descriptions of visual content, and event proposal for identifying and localizing key actions within videos. Together, these works highlight how modern AI systems transform raw visual data into meaningful stories and insights.

Using AI Tools for creating Video Presentations: Why and How

Student groups use AI tools to create video presentations because these tools simplify content creation, automatically generate visuals aligned with computer vision concepts, and save time while ensuring accuracy and engagement. They typically use AI platforms by inputting key topics or scripts, which the tools then transform into structured visuals, animations, and narrated sequences for a cohesive presentation.

Weekly Activity

Expected Output

1 Project Orientation + Topic Finalization

Final project topic selected per group. Supervisor assigns reading material.

2 Literature Survey & Tool Familiarization

Submit 2-page literature survey. Explore tools/libraries (OpenCV, PyTorch, etc.)

3 Dataset Selection & Preprocessing

Download dataset, annotate/clean data, document the structure

4 Model/Method Exploration

Implement baseline model/approach from literature (even if small-scale)

5 Basic Functionality Implementation

Each group shows progress: feature extraction, scene labeler, etc.

6 Midway Review 1 (Mini Presentation) 3 marks

Present architecture, initial results, difficulties

7 Component Tuning & Evaluation

Try tuning hyper parameters, test on validation set

8 Advanced Model Integration / Optimization

Integrate improved methods (e.g., attention in captioning, ResNet for scenes)

9 Result Analysis & Visualization

Show result samples, evaluation metrics, errors

10 Midway Review 2 (Demo Focus)

Working demo of partial pipeline (e.g., event proposal from a 30s video)

11 Refinement & Documentation

Refactor code, create diagrams, update report

12 Final Demo Preparation 5 marks

Finalize slides, GitHub repo, test demo end-to-end

13 Project Presentation & Viva 2 marks

10-min group presentation + Viva + Report submission

Presentation Topics

Paper 1 : Driver Assistance Scene Context System: Real-Time Multi-Label Scene Understanding for Vehicles

Paper 2 :Image Captioning using Xception CNN and Recurrent Neural Networks (RNN)

Paper 3 : Automatic Image Caption Generator using Deep Learning

Paper 4 : AVI-Cap: Attributable, Verifiable, and Interactive Caption Generation

Paper 5 : Coral Bleaching Classification using Deep Learning and Computer Vision

Paper 6 : Transformer Based Spatiotemporal Modeling for Shoplifting Activity Recognition

Paper 7 Snap & Cook: Image-to-Recipe Generation

Paper 8 : Enhancement of Lung CT Images for Improved Diagnostic Clarity

Paper 9: VisionAid – Image Captioning System for Visually Impaired

Paper 10 : IMAGE CAPTIONING USING DEEP LEARNIN

Paper 11 :Vision-Based Drone Navigation Using Computer Vision

Paper 12 :Real-Time Interactive Facial Exercise Coach

Paper 13 :Self-Driving Car: Obstacle Detection and Navigation Using Computer Vision

Paper 14 : Polygon Zone based Object Detection

Paper 15 :Image Caption Generation using an Attentive CNN-LSTM Architecture on the Flickr8k Dataset

Paper 16 : Explainable Image Captioning System (EICS)

Page updated

Google Sites

Report abuse