VisionAir
An Privacy Preserving Android Application which uses an image of your surroundings to predict the PM2.5 values around you
Tensorflow Blog | Poster | Video | Marconi Society Blog | Business Wire
An Privacy Preserving Android Application which uses an image of your surroundings to predict the PM2.5 values around you
Tensorflow Blog | Poster | Video | Marconi Society Blog | Business Wire
Abstract: Acknowledging the adverse effects of the deteriorating air quality, we proposed VisionAir: A privacy preserving android application which is used to build an air quality monitoring system that allows users to be aware and take protective measures against such toxic air. It does so by analysing the air around the user from an image of the surrounding area using a deep learning model we trained. The model was trained on a diverse dataset of High Dynamic Range images. The model estimates the Air Quality Index of the image given as input. To ensure complete user privacy, VisionAir is optimised for standalone On-Device training and claims to not share any private user data due to its on-device processing. The model employs Federated Learning to build a powerful global model without compromising the privacy of the contributing client devices. This allows us to improve our model by updating its weights regularly as more users join VisionAir.
Dataset: To enable further research into this field, we opensource the dataset consisting of nearly 4k HDR and Non-HDR images. This dataset was collected across more than 80 locations of the entire National Capital, Delhi with multiple levels of pollutions.
For collecting the dataset we built an android application that would click an image every 5 minutes from 5 AM to 7 PM and we placed multiple devices at multiple locations for multiple days (ranging for two days to 2 weeks). Samples from our dataset are shown below
Find the dataset on this link : https://vision-air.github.io/feature.html
Scene Generalisation: Previous works (such as those by Pudasaini et al. and Liu et al.) that we were inspired from used images from a camera fixed at a specific location for years. This made their model very scene-specific. Our model however is able to generalize well for new scenes. To achieve this, we collected data at 80+ locations within the Delhi NCR having multiple levels of pollutions. Our dataset consists of scenes including roads, flyovers, high rises, small buildings, parks, clear skies, etc.
Device Generalisation: Similar to scene generalisation, previous works struggled with device generalisation as well since their dataset was collected from very similar if not the same type of devices. Device generalisation in this context means that if you take two different cameras, and take the exact same image from them at the exact same time and ran it through the model, the output values should be same.
This was a mammoth task to solve, especially since we were targetting mobile phones users. The first major constraint was that different mobile phones use different camera sensors and lenses and almost every mobile phone has a different and protected post processing algorithm that is employed on the images before showing it to us. This is an issue since our model relies on the values of intensity of light that is being captured by the camera lens. Different hardware and lens means different amount of light is being captured and similarly the post processing functions that the mobile phone manufacturers employ manipulate the original values in order to make the images aesthetically pleasing.
Another constraint was that the images as stored by the phones and other low-cost camera devices have a very low dynamic range. This meant that the images will not represent the actual amount of light and it will change depending on how light or dark the object you are pointint it at is (you can test this by opening your camera and pointing it at a very dark object or a dim litted part of the room. The camera will try to adjust for the low light and vice versa for a brightly lit room by adjusting the exposure values).
To solve for this we made use of High Dynamic Range (HDR) images. HDR images are constructed by taking multiple images of the same scene at multiple exposure values (e.g. -2, -1, 0, 1, 2) and then combining them. This solved both the problems mentioned above and our model started to give fairly similar results regardless of the device used.
Another step we took was using over 15 devices for collecting the dataset.
In the graphs below I show how our model performed when trained and tested on HDR and non-HDR images. The two phones used for testing were from different manufacturers (Asus in blue and Samsung Galaxy in orange). For the training images we used more phones.
On Device Inference:
To enable the privacy of the user, the whole process of prediction happens only on the device. Using OpenCV for Java and Tensorflow for Java, the inputs features are extracted and the model is ran on the device itself. The images clicked by the user never leave the phone.
Federated Learning:
To improve the performance of the model without compromising the anonymity of the user, we implement 'Federated Learning', which updates the global model after averaging the contributions from all of the client devices. Since more users are willing to share data if anonymity is maintained, this allows the collection of more data thus enhancing the generalisation capacity of the model.
Find our tutorial on Federated Learning here : https://vision-air.github.io/federated.html