How to scrape data in general

Post date: Apr 22, 2018 4:37:52 PM

There are a few things you will need

  1. scrapy -- a crawler + parsing framework
  2. beautifulsoup -- a package to parse data out of html
  3. selenium -- a framework to mimic human browsing function, this will get you through Captcha or checkboxes.

Selenium examples:

This is a very good example from ChrisMuir's "Zillow Scraper for Python using Selenium"

https://github.com/ChrisMuir/Zillow

ScrapingHero: https://www.scrapehero.com/how-to-scrape-real-estate-listings-on-zillow-com-using-python-and-lxml/

Using scrapy in Jupyter notebook

https://www.jitsejan.nl/using-scrapy-in-jupyter-notebook.html

Good Chrome extension to help with scraping. Highly recommended.

www.selectorgadget.com/

Scraping Zillow home data

https://github.com/ual/

https://github.com/ChrisMuir/Zillow/blob/master/zillow_functions.py

scraping craigslist

https://github.com/ual/scraper2/blob/master/scraper2/scraper2.py

https://github.com/ual/