How safe is your website against a crawler?

THE CODE

The final project code is available here: https://github.com/HaritDivyesh/websites-vs-webcrawler

THE PROJECT

If your website was just greeted by our crawler, don't worry! It's just a grad school project and we mean no harm. If you still wish to not be bothered by the crawler, please contact me at: dharit@umass.edu, and we will take your website off our list.

The main goal of this project is to see how/if top websites block web crawlers. An additional goal is to compare these observations with that of a headless browser, such as Selenium.

The project goals are described in more detail here: https://goo.gl/vKnpzX

PREVIOUS UPDATES

Good news! We made some progress. You can take a look at the current project status here: https://goo.gl/oaeFgs