The final project code is available here: https://github.com/HaritDivyesh/websites-vs-webcrawler
If your website was just greeted by our crawler, don't worry! It's just a grad school project and we mean no harm. If you still wish to not be bothered by the crawler, please contact me at: dharit@umass.edu, and we will take your website off our list.
The main goal of this project is to see how/if top websites block web crawlers. An additional goal is to compare these observations with that of a headless browser, such as Selenium.
The project goals are described in more detail here: https://goo.gl/vKnpzX
Good news! We made some progress. You can take a look at the current project status here: https://goo.gl/oaeFgs