https://github.com/codeforamerica/asap_pdf
https://github.com/codeforamerica/asap_pdf/blob/main/python_components/crawler/crawler.py