Many services, today, employ crowdsourcing to gather large amounts of data from users in this age of data. They, then use this data to provide some service or feature back to the users. However, the quality of the services can degrade significantly if the data is poisoned and our paper shows that these crowdsourcing services aren't employing proper input validation against data poisoning attacks. Below we show proof of concept experiments demonstrating improper input validation vulnerabilities on popular mobile crowdsourcing services.
Link-to-code: https://drive.google.com/drive/folders/128E7qDr1J3ml3fHYXWQgg_qj3G8mpDT5?usp=sharing
In this video, we perform an injection on "Neighbors by Ring" with a fake generated post.
On the service's app, users can post notification about issues relating to crime or safety in their neighborhood.
We show that it is possible to inject posts related to fictitious events and additionally that post generation can be automated.
Our text generation method uses a GPT-2 model finetuned on genuine posts we collected through the app for each category.
In the categories "Crime" and "Safety", our method achieves successful text-only injections with a rate of 88% and 76% respectively.
Our multi-modal strategy (text+relevant image) achieves overall 80% success rate across all topics.
Transit is a popular crowdsourcing app that can help its users plan a trip and support them in their travel by predicting and showing the upcoming subway or bus. Using a genymotion non-root emulator as the adversary, we managed to inject the fake bus on the real device automatically showing that it is possible to fool the Transit service by poisoning its crowdsourced data.
In this attack we show that these location based services do no semantic validation for input data. To prove our claim, we inject three fake POIs through the service's app that make no sense semantically.
The three points are as follows:
1. Inside Atlantic ocean (-12.0, -12.0)
2. Inside Antarctica (-75.0,1.0)
3. On Mount Everest (27.9881, 86.9250)
Strava allows its users to report a number of physical activities, such as running, cycling, and swimming. Using a fake account and the post API, we were able to post a running/cycling/swimming activity covering 50000 km in either 30 seconds or 8784 hours (1 year).