Akhil Kusuma
Master of Professional Studies in Data Science
(August 2019 - May 2021)
Instructor: Dr. Ergun Simsek
Master of Professional Studies in Data Science
(August 2019 - May 2021)
Instructor: Dr. Ergun Simsek
In recent years, we have witnessed self-driving cars and auto park systems in various automobile industries along with lane detection equipped into the automobile system. It's an era where cars drive and park automatically without any driver manually operating the vehicle. Drivers experience a significant amount of time searching a parking spot and often end up paying more for what they use and also end up paying fines for parking in non-parking zones.
In larger US Cities like Newyork and Los Angeles, drivers experience 107 hours and 85 hours in total annually just for looking and finding a parking spot in the lots or even sideways which in turn increases air pollution in the major cities. Looking for a parking space always wastes traveling time and for the driver's convenience, public parking lots are responsible to inform the driver about the availability and location of the parking space. However, maintaining such kind of parking system manually needs lots of human resources and 24/7 monitoring. Therefore, we require an unsupervised parking lot detection for counting the number of parking spaces and exact parking area or even find parking while driving through parking spots which helps to identify the location and monitor the changes of space status over time.
The main objective of the proposed system is to find parallel parking spaces along with regular parking spots which is a major challenge that most people and autonomous vehicles are facing today.
The more people on the road increases and drivers become aggressive. Some of the major problems that drivers face today in real-world are :
Circling the parking lot for parking the vehicle: Research shows that 30% of traffic is made up of people looking for a place to park and results in congestion and traffic accidents.
Lack of timely information: Drivers don't have real-time parking information and are confused about where to go and park.
Greater demand for Parking: During the holiday season, people tend to move to other cities or go shopping and lack of information on parking spaces is quite a big problem nowadays.
Source: Greatest Challenges Parking Facilities Managers Face by All traffic Solutions.
A counter-based system is a system that relies on sensors at the entrance and exit of the parking lot. These systems only provide information on the total number of vacant spaces, which doesn't guide the driver to the exact location of the parking space and such systems cannot be applied to on-street parking bays and residential parking spaces.
Wired sensor-based and wireless magnetic-sensor-based systems rely on ultrasonic, infrared light, or wireless magnetic-based sensors installed on each parking space.
The main disadvantage of both the systems is it requires the installation of costly sensors and also require transceivers.
References: http://ceur-ws.org/Vol-2087/paper5.pdf
Funck et al use an algorithm to compare the reference image and input datasets to calculate the vehicle to parking space pixel area using PCA (principal component analysis).
Funck, S., Mohler, N., and Oertel, W. (2004). Determining car-park occupancy from single images. In IEEE Intelligent Vehicles Symposium, 2004, pages 325–328.
Tsai et al train a Bayesian classifier to verify the detections of vehicles using corners, edges, and wavelet features.
Tsai, L. W., Hsieh, J. W., and Fan, K. C. (2007). Vehicle detection using normalized color and edge map. IEEE Transactions on Image Processing, 16(3):850–864.
True adopts a combination of vehicle feature point detection and color histogram classification.
True, N. (2007). Vacant parking space detection in static images. University of California, San Diego, 17
The COINS (Car park Occupancy Information Systems) integrates advanced image processing techniques including seeding, boundary search, object detection, and edge detection together for reliable parking space detection.
Bong, D., Ting, K., and Lai, K. (2008). Integrated approach in the design of car park occupancy information system (coins). IAENG International Journal of Computer Science, 35(1):7–14.
Amato et al. (2016) develop a decentralized solution for visual parking space occupancy detection using deep CNN and smart cameras.
Amato, G., Carrara, F., Falchi, F., Gennaro, C., and Vairo, C. (2016). Car parking occupancy detection using smart camera networks and deep learning. In 2016 IEEE Symposium on Computers and Communication (ISCC), pages 1212–1217
Data: The source of data for this project will be a real-time video of any parking lot that has both perpendicular or parallel parking scenarios. The Data would be generated from the video for each frame using OpenCV ( a computer vision library).
The dataset used for detecting the cars in the image or any object in the parking space would be COCO (Common Object in Context)
Implementation: The generated data is then fed to the CNN model which is used to detect objects like cars in each frame. Use of Mask -RCNN for detecting parking spaces vacant or not using object detection models like COCO or YOLO datasets.
Implement an RCNN with Mask to detect objects like cars in the input video stream that is fed to the model as each frame.
Detect the bounding box of the parking spot and classify them as Available or not available.
Use OpenCV library to convert video into frames.
Use TensorFlow to create Deep CNN architecture and use the pre-trained model to compare the accuracy of parking space detection between two models.
Use Python external messaging service/API to send a text message to the driver if space is available.
COCO dataset description:
Created by Microsoft.
COCO is large-scale object detection, segmentation, and captioning dataset.
It has 330K images where 200K images are pre-labeled.
1.5 million object instances.
80 object categories.
Image Reference: https://cocodataset.org/#home
Phase 2 is about the actual implementation of the parking detection model.
The objective of Phase 2 is to detect bounding boxes and masks of the parking lot images using object detection and Object segmentation models in Deep Learning.
Let us understand both terms in detail.
The basic understanding behind object detection is to concentrate on precisely estimating the details and exact location of objects contained in each image alongside considering to classify images. Object detection usually is in use in the real world and the use-cases are face detection, pedestrian detection, and skeleton detection. Object detection helps to provide valuable information for semantic understanding of videos and is useful in many applications including image classification, human behavior analysis, face recognition, and autonomous driving by Tesla.
Results related to the Object detection application obtained using the COCO dataset for all the categories.
Image segmentation is a very important topic in image processing and is helpful to understand the content of the image. It allows exploring many applications such as image compression, scene understanding, locating objects in satellite images ..etc. With the advent of deep learning, More CNN and FasterRCNN models are used for image segmentation along with traditional segmentation models.
Results related to the Object segmentation application for COCO dataset.
Mask R-CNN an object detection and instance segmentation model developed on top of Faster R-CNN. Faster R-CNN is generally a region-based network consists of convolutional neural layers, The output of the Faster RCNN model returns bounding boxes for each class and its class label with a confidence score/detection score.
The architecture of Faster R-CNN works in two stages:
Stage1: It consists of two networks which are the backbone (ResNet, VGG, etc..) and region proposal network. These networks run once per image to give a set of region proposals which are regions in the feature map that contain objects.
Stage2: The network predicts bounding boxes and object class for each of the proposed regions obtained in stage1.
Faster R-CNN predicts object class and bounding boxes. Mask R-CNN is an extension of Faster R-CNN with an additional branch for predicting segmentation masks on each Region of Interest (RoI).
Instance segmentation is usually a challenging task that requires detecting all objects in an image and also segmenting each instance at a time which is regarded as two independent processes. It might exhibit systematic errors in some instances due to the multi-task scheme. To solve this, we utilize an additional branch that runs parallel to the existing branches in Faster R-CNN for classification and bounding box regression. The objective of Mask R-CNN is to predict segmentation masks in a pixel-to-pixel manner. This new additional segmentation branch encodes an m*m mask to maintain object spatial layout.
For more information on MaskRCNN: https://arxiv.org/pdf/1807.05511.pdf
To clone the Mask RCNN model to the working project: https://github.com/matterport/Mask_RCNN.git
This custom dataset needs to have three functions
load_data(): This will add our specific classes to the data loader class.
load_masks(): Load instance masks for the given image. This function converts the different mask formats to one format in the form of a bitmap [height, width, instances].
image_reference(): Return a link to the image on the COCO Website.
First, the model must be defined via an instance MaskR-CNN class. This class requires a configuration object as a parameter. The configuration object defines how the model might be used during training or inference. In this case, the configuration will only specify the number of images per batch, which will be one, and the number of classes to predict. It is recommended to use a stronger GPU to train the Mask RCNN model.
Prepare the Data for training the MaskRCNN using customDatasetBuilder class which creates Tensorflow Records.
Initialize Mask R-CNN model for “training” using the Config instance that we created.
2. Load the pre-trained weights for the Mask R-CNN from COCO data set excluding the last few layers .
We exclude the last few layers from training for ResNet101. Excluding the last layers is to match the number of classes in the new data set. if it is a Custom dataset use pre-trained coco weights to load or if it is the weights and training on imagenet model then use model imagenet weights and finally use the final weights that are previously trained on layers saved in the model path.
3. Train the heads and all layers with higher learning rate to speed up the learning
We can increase the speed of learning for head layers by increasing the learning rate
Also we can increase the epochs to anywhere from 100–500 and see the difference in the accuracy of the object detection. I have used only 20 epochs as I trained it on a GPU hosted in Google Colab pro.
4. Save the trained weights for the custom data set and make inference/predictions for object detection and image segmentation.
5. The output of the model contains a dictionary for each image processed when we passed an image to detect() function. It has keys for the bound boxes/coordinates/regions-of-interest (ROI), masks, mask_ids, and confidence scores. The keys of the dictionary of note are as follows:
‘rois‘: bound boxes/coordinates/regions-of-interest (ROI) for detected objects.
‘masks‘: masks for the detected objects.
‘class_ids‘: integers for the detected objects.
‘scores‘: confidence score
We can draw each box detected in the image by first getting the dictionary for the first image (e.g. results[0]), and then retrieving the list of bounding boxes (e.g. [‘rois’]).
Detecting cars, trucks, and boats with bounding boxes as those are the only objects that can be seen inside the parking space.
Create an image mask using the weights trained on the Mask RCNN model.
PHASE-3
The goal of Phase 3 is to feed real-time video of the parking lot and detect objects and cars from the parking space. To achieve this goal we will be using Opencv to process the raw input video data and use the frames to get the actual details in the image[individual frames] and also let the model detect the empty parking when the parking spot is emptied.
Note: At first, we are assuming that the actual video has parking spaces filled up and when the model detects whether it is empty or not, it displays a bounding box with a score (detection score of car, truck, boat). when the detection score decreases to a certain value (0.2), it waits for 4seconds to check the car/truck/boat is actually not there in the parking space and shows the output as parking available.
Since we have to capture a live stream of video input of the parking lot, we can actually process this video using the OpenCV VideoCapture class.
In order to do that, we have to create a VideoCapture object with an input file path as an argument to it.
The properties of video using VideoCapture object can provide are:
Frames per second
Video Codec /Video Format
Video Size
Total Frame Count
Compare each frame with other frames of video to detect any changes in the video data. Get the absolute difference of two frames and convert the absolute difference output to a gray image.
Use Gaussian blur to reduce noise in an image frame using OpenCV gaussian blur. To Detect objects in each and every frame we have to feed RGB images to a MaskRCNN model. The accuracy of the detection can be found using Intersection of Union.
The Intersection of Union is a metric that is used to measure the accuracy of the object detector on the COCO dataset. MaskRCNN model provides predicted bounded boxes as output and it can be used to detect IoU.
In other words, to apply Intersection over Union to evaluate an object detector we need:
Ground truth bounding boxes ( This can be done using opencv on the image frames)
Predicted bounding boxes
The IoU is given by
IoU=(Area of Overlap)/(Area of Union)
The limitation that this project had is the view of the video angle. If the view is a satellite view, then we would be able to apply OpenCV image transformations to find out parking lines and parking components.
We can use Canny Edge Image Detection to detect edges in the image frames so that we can consider the only masked region as our parking regions.
Use OpenCV Hough Line transform on the edge images in order to detect white lines/ yellow lines.
We have used an object detection model to detect objects in the image. Instead, create a CNN model that takes empty parking spaces as input and trains on them and try to detect empty spaces using the model.
The backbone model that we have used is ResNet, VGG, Inception models are heavy models with many hidden CNN layers in them. It would be good if we produce lighter models.
Planning to use Tensorflow and Python with OpenCV module for the Computer Vision task to extract frames from the input video. The code for the same will be updated on GitHub on a Phase basis.
*Complete Code is posted on Github and i have used Google Drive as the working repository for the project.
Mining, Data. “The Survey Says: Car Parking.” The National, The National, 18 Jan. 2017, www.thenationalnews.com/opinion/the-survey-says-car-parking-1.69243?videoId=5754807360001.
“Common Objects in Context.” COCO, cocodataset.org/#home.
Bill Yang Cai, et al. “Deep Learning Based Video System for Accurate and Real-Time Parking Measurement.” Arxiv, 20 Feb. 2019.
Zhang, Lin, et al. “Vision-Based Parking-Slot Detection: A DCNN-Based Approach and a Large-Scale Benchmark Dataset.” IEEE TRANSACTIONS ON IMAGE PROCESSING, 11 Nov. 2018.
Ibrahim, Hossam El-Din, Car Parking Problem in Urban Areas, Causes and Solutions (November 25, 2017). 1st International Conference on Towards a Better Quality of Life, 2017, Available at SSRN: https://ssrn.com/abstract=3163473 or http://dx.doi.org/10.2139/ssrn.3163473
Geitgey, Adam. “Snagging Parking Spaces with Mask R-CNN and Python.” Medium, Medium, 21 Jan. 2019, medium.com/@ageitgey/snagging-parking-spaces-with-mask-r-cnn-and-python-955f2231c400.
Khandelwal, Renu. “Object Detection Using Mask R-CNN on a Custom Dataset.” Medium, Towards Data Science, 9 Feb. 2020, towardsdatascience.com/object-detection-using-mask-r-cnn-on-a-custom-dataset-4f79ab692f6d.
Brownlee, Jason. “How to Use Mask R-CNN in Keras for Object Detection in Photographs.” Machine Learning Mastery, 1 Sept. 2020, machinelearningmastery.com/how-to-perform-object-detection-in-photographs-with-mask-r-cnn-in-keras/.
Acharya, Debaditya, et al. “Real-Time Image-Based Parking Occupancy Detection Using Deep Learning.” Ceur-Ws, ceur-ws.org/Vol-2087/paper5.pdf.
Cao, Yuanzhouhan, et al. “Exploiting Depth From Single Monocular Images for Object Detection and Semantic Segmentation.” IEEE Transactions on Image Processing, vol. 26, no. 2, Feb. 2017, pp. 836–46. DOI.org (Crossref), doi:10.1109/TIP.2016.2621673.
Mask R-CNN for Object Detection and Segmentation https://github.com/matterport/Mask_RCNN
“Image Processing in OpenCV¶.” OpenCV, opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_table_of_contents_imgproc/py_table_of_contents_imgproc.html.
“Greatest Challenges Parking Facilities Managers Face.” All Traffic Solutions, 9 Aug. 2019, www.alltrafficsolutions.com/blog/greatest-challenges-parking-facilities-managers/.