Intro to Web Scraping with Python
(UPF, May 15-16, 2018)
{course materials attached below}
This introductory workshop will teach you how to use powerful open source web scraping libraries in order to automate the data extraction process. You will learn how to inspect web pages for useful patterns and how to write a script in Python that can scrape multiple pages at one time, select the relevant information and save the data in a structured format.
Workshop format:
The workshop will be divided in two parts:
Web scraping and HTML
We’ll cover some simple HTML syntax that usually surrounds the data we are interested in; and explore the main advantages of Python libraries used to scrape information from the web.
Examples
We will learn how to create a simple web crawler by working together on some general examples. This will be a hands-on experience. You will be provided with a code that we will cover in great detail during the class. You will also have time to practice what you’ve learned by building your own web scraping scripts for a given assignment. We will also introduce some basic Python programming concepts.
Prerequisites:
Basic knowledge of Python is required to get most out of this workshop, but not mandatory.