Introduction:
In earlier technologies sensor detectors are used to detect fire and smoke. The traditional smoke sensors are activated when the particles of smoke or fire reach them and after sensing some parameters like heat and gas which a delay occurs between the fire outbreak and the particles reached to the sensors. Also, the sensors don’t use important features like shape and color to correctly predict the smoke and fire. Vision-based technologies are created to address and resolve the sensor-based system issues of fire and smoke detection.
There exist two techniques for vision-based detection of fire and smoke image-based, and video-based. In automated video surveillance, smoke detection receives a lot of attention because of smoke pattern, shape, motion analysis, color, and texture. As the computational resources are getting high with time, the researchers use deep models, domain adaption strategies, and transfer learning approaches to enhance model capability for fire and smoke detection meanwhile lowering the false alarm rate.
This research work will focus on a single deep 3D model architecture with an attention mechanism. The proposed model will deeply learn the spatial and temporal features of fire and smoke from video frames. This will make model able to detect fire and smoke at initial stage and reduce the false alarm rate so that the affected area becomes safe after-fire effects by knowing at an early stage in both indoor and outdoor areas.
Dataset:
The available standard datasets use for model training and testing consisted of difficult low-quality and high-quality video sequences will be used. The deep model performance got high with the large amount of dataset, and it enhances the model capability for deep features understanding from more different scenarios. The dataset collected from literature use for training validation and testing are listed below.
MIVIA fire detection standard dataset publicly available consists of 31 real-world videos The dataset is divided into two parts: 14 videos with fire scenarios and 17 videos without fire scenarios. The dataset focuses on the outer environment, featuring fire, as well as other sceneries such as bright sunshine, smoke-colored items, hillsides with smoke, and apartments with fire-colored furniture. Bilkent is a low-resolution video set extension of the mivia dataset and had similar fire videos but different extended smoke videos so in total it contains 25 smoke 14 fire and 1 neutral video. KMU is a publicly available dataset consisting of a total of 38 videos of which 22 are fire videos 6 are smoke videos and 10 neutral contains objects like smoke and fire.
Yuan dataset contains smoke images ,3 smoke videos and 3 neutral videos. From this dataset we will use the videos for our model training. In firesense [49] uses 11 non-fire and 16 fire videos for fire detection, while 13 non-smoke and 9 smoke videos was used for smoke detection. An online available dataset on Kaggle comprises of 16 videos in total, 7 of which are fire videos 7 of these are smoke videos and 2 are neutral CCTV footage of different areas such as a house, bank, office, and hotel. Ultimate-chase [45] [28]collection of 12 fire videos of different at different places. And as we see most dataset video is from the outdoor environment so to keep it balanced, we have collected 10 fire videos from YouTube and 2 neutral videos that have similar objects like fire to make the dataset challenging.
1. https://mivia.unisa.it/datasets/video-analysis-datasets/fire-detection-dataset/
2. http://signal.ee.bilkent.edu.tr/VisiFire/Demo/
3. https://cvpr.kmu.ac.kr/Dataset/Dataset.htm
4. http://staff.ustc.edu.cn/~yfn/vsd.html
5. https://www.kaggle.com/datasets/ashutosh69/fire-and-smoke-dataset.
Methodology:
In the related work section, we focused on the study trend for fire and smoke detection and found that earlier work was primarily based on 2D models that was not more efficient to extract the video temporal features. Therefore, this research focus on 3D CNN based model with an attention mechanism to extract three-dimensional spatial and temporal features.
Attention mechanism can determine which information in an input is most important for performing a task, hence enhancing performance. The visual attention mechanism is vital in image formation, scene categorization, target identification, tracking and enhances the capabilities of CNN model. The combo of low and high live feature will pay more focus on the multiple points [53]. Fig 1 presents the flow of steps will follow to design a classifier and explanation are defined below.
Input videos:
The model we will propose is for vision-based fire and smoke detection in both outdoor and indoor contexts, thus we have collected numerous videos stated in section 3.1 from many platforms to get better results and to cover all conceivable smoke and fire elements we will generate frames from videos. The videos stream will pass to model as an input.
Preprocessing:
The frames will be extracted from each input video. Each video frames will go through a preprocessing stage such as frame resizing, normalization to ensure that the frame size is uniform, pixels intensity level is at standardize scale and to do any other necessary preprocessing procedures to assist our model run rapidly and effectively.
3D classifier:
A deep 3D model with CNN and attention mechanism will design. The train test split ratio of dataset be 80% and 20 %. From train set 10% set will use for validation. We will predetermine the batch size based on the model's specifications. After that, batches of preprocessed video frames will pass to 3D attention-based convolutional network for spatial temporal feature extraction and multiclass classification. The deep 3D model will train, validate, and test on the video’s dataset mentioned in section 3.1. Each testing videos assigned an expected class after classification, and the overall results will be assessed.
Evaluation Measures:
The listed below evaluation measures will use to evaluate the model.
1. Accuracy
2. Precision
3. Recall/Detection rate
4. False alarm rate
5. F1-score
Results:
Implementation pending
Discussion
An unexpected eruption of fire has terrible effects on human life, lands, industries, forest, and animals. Smoke arises before the fire may be seen from far distance; hence smoke detection is very beneficial for early detection of fire which can help to save many lives and avert a massive tragedy and loss. Traditional sensor-based detectors are limited in their ability to detect smoke and fire from a distance, as they require close proximity to the fire to trigger a warning. This puts individuals at risk when monitoring a fire incident from a remote location. The sensor will detect the smoke and fire when the particles reach the sensor until the fire caused many disasters at that time. Because of its great speed, the vision-based detection of fire and smoke is the only way to identify them early. As a result, this research will focus to create a robust 3D architecture for the early detection of smoke and fire in complex scenarios. The video dataset includes outdoor and indoor environments such as forests, field lands, houses, industries, offices, and grounds. This research will propose a model that combines a 3D CNN model with an attention mechanism. The frames extracted from the video dataset will pass through a preprocessing step and after that, the pre-processed video frames will be fed to the 3D model. The main purpose of attention mechanism integration is to mainly focus on the Region of Interest(RoI) which will be more useful for the model to learn spatial and temporal features from the frames. The proposed model will perform multiclass classification for fire and smoke. The major goal of this study is to minimize false alarms and identify fire and smoke in their early stage.
MS(CS) Research Student
Rimsha Shoukat
Working email: Fa21-rcs-014@cuilahore.edu.pk
Personal email: rimshashoukat.121@gmail.com