How to scrape data in general
Post date: Apr 22, 2018 4:37:52 PM
There are a few things you will need
- scrapy -- a crawler + parsing framework
- beautifulsoup -- a package to parse data out of html
- selenium -- a framework to mimic human browsing function, this will get you through Captcha or checkboxes.
Selenium examples:
This is a very good example from ChrisMuir's "Zillow Scraper for Python using Selenium"
https://github.com/ChrisMuir/Zillow
ScrapingHero: https://www.scrapehero.com/how-to-scrape-real-estate-listings-on-zillow-com-using-python-and-lxml/
Using scrapy in Jupyter notebook
https://www.jitsejan.nl/using-scrapy-in-jupyter-notebook.html
Good Chrome extension to help with scraping. Highly recommended.
www.selectorgadget.com/
Scraping Zillow home data
https://github.com/ual/
https://github.com/ChrisMuir/Zillow/blob/master/zillow_functions.py
scraping craigslist
https://github.com/ual/scraper2/blob/master/scraper2/scraper2.py
https://github.com/ual/