Python and Web Data Extraction

Description

This was an one-day workshop offered to faculty and doctoral students at Fox School of Business, Temple University on the basics of Python and Web Data Extraction run from 9:50 am to about 3:45 pm on Tuesday, May 31st, 2016.

The tutorials show how to use Python to download firms' 10-Ks from Edgar and how to extract data from 10-Ks.

Setup Guide

To set up Python on your computer, please follow these instructions: Quick Guide to Installing and Setting Up Python 2.7

Python 2 versus Python 3:

The tutorial and scripts were originally written based on Python 2.7. Python 2 and Python 3 have quite some differences, so the scripts below won't work if you have Python 3.

I do plan to update the scripts for Python 3 in the future. If you have done that and are willing to share your scripts with me, it would be greatly appreciated.

Topic 1: Python Basics

Slides: 1 Python Basics

Tutorial: Tutorial 1 – First Python Script

Python Script: FirstPythonScript.py

Topic 2: Web Scraping

Slides: 2 Web Scraping

Tutorial: Tutorial 2 – Extracting Data from 10-K

Python Scripts (right click and save as):

CSV file: CompanyList.csv

Topic 3: Introduction to Natural Language Processing

Slides: 3 Intro to Natural Language Processing

Tutorial: Tutorial 3 – Computing TF and TF-IDF

Python Scripts: 5tfidf.py