We have utilized three existing datasets comprising over 14,000 images taken across India, Nepal, and Bangladesh. Each image is paired with one of seven different air quality labels ranging from “Good” to “Severe”, indicating the air quality represented by each image.
Example Images
A common limitation of previous research is that many photos were taken at the same location and at the same time with only a slight change in angle. These duplicated photos may cause data leakage problems during model training and evaluation. Our solution is to combine the three data sources from previous research, fully deduplicate the images, and ensure their AQI scales are comparable. We summarize our data source below.