Video Processing System Documentation

GitHub - kishan9999/video-processing-using-deep-learning: This project extracts informations from the video file using deep learning technics

Video Processing System Documentation provides a comprehensive guide to developing and deploying a robust system for extracting, analysing, and storing video features. It covers implementation details, setup instructions, and architectural insights crucial for understanding and maintaining the system.

Problem Statement

This project aims to leverage deep learning and machine learning techniques for video processing using Python.

Objective

Develop a comprehensive video processing system that can efficiently extract and store features from video files in a database. The system should be deployed on Streamlit Share, making the extracted features accessible through an interactive web application.

Dataset Description

For video processing and testing, I have used the UCF101 Action Recognition Dataset, which contains 13,320 videos from 101 action categories. For simplicity, a random sample of 1,000 videos are used.

Following dataset has been considered for the indoor-outdoor classification task and for the poc development I have used 60000 labelled images for model training

Places Dataset:

· Description: A scene-centric database with 10 million labelled images covering 400 scene categories, including indoor and outdoor scenes.

· Website: Places Dataset

SUN Database:

· Description: Large-scale database containing 131,067 labelled images across 908 scene categories, including indoor and outdoor scenes.

· Website: SUN Database

Scene Understanding (SUN) RGB-D Dataset:

· Description: Contains RGB-D images annotated with scene types, including indoor and outdoor scenes.

· Website: SUN RGB-D Dataset

Other reference

· https://groups.csail.mit.edu/vision/SUN/hierarchy.html

System Requirements

Software Requirements

To ensure the video processing system runs smoothly, the following software packages and versions are required:

Python: Version 3.8 or later.
Pandas: Version 2.0.3 for data manipulation.
TensorFlow: Version 2.15.0, a powerful library for machine learning and deep learning.
OpenCV: Version 4.8.0.76 for video and image processing.
NumPy: Version 1.25.2, essential for numerical computing.
Matplotlib: Version 3.7.1 for plotting and visualization.
Flask: Version 2.2.5 to create a lightweight web application.

Hardware Requirements

For optimal performance, particularly when dealing with deep learning models and large video files, the following hardware is recommended:

GPU: Google Cloud GPU (e.g., NVIDIA Tesla K80, T4, P100, V100) for accelerated computation.
CPU: A multi-core processor for general processing tasks.
RAM: At least 16 GB of RAM to handle large video files and datasets.
Storage: SSD with sufficient space (minimum 512 GB) to store video files and processed data.

Testing Environment:

Operating System: Windows system configured for testing with GPU support.
Tools:
- Databricks: For scalable data processing and analysis.
- Jupyter Notebook: For interactive development and testing of code.

This setup ensures that the video processing system is robust, scalable, and capable of handling the computational demands of video and deep learning tasks.

Video Processing

Flask API to get the video information

Keyframe Extraction and Metadata Extraction

· In this task, we utilized the OpenCV module to extract essential video information such as resolution, frame rate, and duration using built-in functions like cv2.CAP_PROP_FRAME_COUNT, cv2.CAP_PROP_FPS, cv2.CAP_PROP_FRAME_WIDTH, and cv2.CAP_PROP_FRAME_HEIGHT.

· For the detection and classification tasks, we set a threshold of 1 FPS. For instance, if a video is 4 seconds long with a total of 100 frames at 25 FPS, we extract and analyse only 4 frames (1 frame per second) for human detection and gender classification.

· Additionally, for indoor/outdoor classification, we base our analysis solely on the first frame of the video.

Human Detection and Gender Identification

For this task, I referenced a GitHub project that utilizes deep learning for accurate gender identification and person detection from facial images. Specifically, I employed models developed by Tal Hassner and Gil Levi, which were implemented in the GitHub repository at https://github.com/smahesh29/Gender-and-Age-Detection. This approach utilizes OpenCV's deep learning capabilities to reliably detect humans and classify genders.

Indoor/Outdoor Classification

For the Indoor/Outdoor classification task, I utilized the MobileNetV2 pre-trained model available in Keras, leveraging a subset of the SUN dataset comprising 60,000 images labeled with indoor and outdoor categories. Initially, I applied transfer learning by adding a custom dense layer for binary classification. Additionally, I experimented with the Xception model; however, the results were less satisfactory compared to MobileNetV2, which also exhibited faster inference times.

Dataset Splitting and Training

The dataset was divided into training, testing, and validation sets using a 70-15-15 split ratio. Below are the training curves depicting the model's performance during training:

Model Training Details

Model Architecture: MobileNetV2 with additional dense layers for classification.
Activation Function: Sigmoid activation was used for effective binary classification.
Optimizer: The ADAM optimizer was employed to optimize model parameters.
Evaluation Metric: Accuracy metric was used to evaluate the model's performance during training.

Improving Accuracy

To enhance accuracy, the model can benefit from:

Increasing the number of epochs during training.
Utilizing the entire dataset for training.
Implementing data augmentation techniques to enrich the dataset and improve generalization.

Overall, the MobileNetV2 model proved effective for indoor/outdoor classification, providing both reliable performance and efficient inference times.

Final output

The output has been saved in a CSV file to process 1000 video files and generate information such as title, duration, FPS, resolution, frames, presence of humans, indoor/outdoor classification, etc.

Data Storage

Database Selection

For this project, we will use a combination of AWS S3 and AWS Redshift:

AWS S3 Bucket: For storing video files.
AWS Redshift: For storing metadata and extracted features.

By using AWS S3 and Redshift, we leverage the strengths of both platforms, ensuring scalable, cost-effective, and high-performance storage and retrieval of video files and their associated metadata.

Database Schema
- Table Structure
- Data Types and Constraints

Source Code

All the video processing codes and model training and testing codes are available in the GitHub repository.

https://github.com/kishan9999/video-processing-using-deep-learning.git

Setup Instructions

1. Clone the repository:

git clone https://github.com/kishan9999/video-processing-using-deep-learning.git

2. Install the required dependency

3. Download All weights and place it in the weights folder

4. For video processing, use single_inference.py

python single_inference2.py --video_path

Quality Standards

To ensure the maintainability, readability, and reliability of the video processing system, the following code quality standards are adhered to:

· PEP 8 Compliance: The code follows the PEP 8 style guide for Python, ensuring consistency in coding style.

· Version Control: The project is maintained in a GitHub repository, allowing version control and collaborative development.

· Error Handling: Comprehensive error handling is implemented to manage exceptions and provide informative error messages.

System Architecture

The system architecture of the video processing system is designed to be modular, scalable, and efficient. It consists of the following key components:

1. Video Processing Module:

- Responsible for processing video files, extracting keyframes, metadata, detecting humans, identifying gender, and classifying indoor/outdoor scenes.
- Uses libraries like OpenCV for video processing and TensorFlow for deep learning tasks.

2. Data Storage Module:

- Stores the extracted features and metadata in an AWS Redshift database.
- Video files are stored in an AWS S3 bucket.

3. Web Application:

- Developed using Streamlit to provide an interactive interface for users.
- Allows users to filter and search videos based on activity type, gender, and location.
- Retrieves and displays video data from AWS Redshift.

4. Deployment:

- The web application is deployed on Streamlit Share, ensuring accessibility and ease of use.

Deployment

· To deploy your video processing project on Streamlit Share, begin by ensuring your code is hosted on GitHub and dependencies are listed in a requirements.txt file.

· Log in to Streamlit Share, connect your GitHub repository, and specify the path to your app.py file.

· Configure any necessary environment variables and deploy your application.

· Once deployed, use the provided URL to access and monitor your application.

· For updates, simply push changes to GitHub, and Streamlit Share will automatically redeploy your application with the latest code, ensuring seamless maintenance and accessibility for users.

Challenges and Solutions

Throughout this project, we encountered several challenges that required careful consideration and exploration:

1. Choosing the Right Model for Human Detection and Gender Classification: Finding the optimal model for human detection and gender classification involved extensive research across various open-source libraries. Ultimately, the OpenCV deep learning approach proved to be the most suitable choice due to its robustness and effectiveness.

2. Dataset Selection for Indoor/Outdoor Classification: Selecting the appropriate dataset for indoor/outdoor classification posed challenges, considering factors such as dataset scope and compatibility with model training. The SUN dataset emerged as the best fit for its comprehensive coverage of indoor and outdoor scenarios, facilitating integration with model training.

3. Selecting the Deep Learning Model: Evaluating different deep learning models such as Xception, VGG16, YOLOv5, and UNet posed another challenge. After considering factors like response time and accuracy, MobileNetV2 emerged as the optimal choice due to its balance between performance and efficiency.

Addressing these challenges required thorough evaluation and experimentation to ensure the chosen solutions aligned with project objectives and performance requirements.

Conclusion

Summary

This project serves as a proof of concept for a full-scale video processing software development. It provides a framework and a technological overview of how video processing can be implemented.

Future Work

· For production use, the accuracy can be further improved, and monitoring can be set up using MLflow and Grafana.

· Data processing can be achieved through Databricks solutions.

Contact Information

· Prepared By: Kishan Joshi

· Email: ksnjsi@gmail.com

· Contact: +918866238429

Page updated

Google Sites

Report abuse