Python and Web Data Extraction
Description
This was an one-day workshop offered to faculty and doctoral students at Fox School of Business, Temple University on the basics of Python and Web Data Extraction run from 9:50 am to about 3:45 pm on Tuesday, May 31st, 2016.
The tutorials show how to use Python to download firms' 10-Ks from Edgar and how to extract data from 10-Ks.
Setup Guide
To set up Python on your computer, please follow these instructions: Quick Guide to Installing and Setting Up Python 2.7
Python 2 versus Python 3:
The tutorial and scripts were originally written based on Python 2.7. Python 2 and Python 3 have quite some differences, so the scripts below won't work if you have Python 3.
I do plan to update the scripts for Python 3 in the future. If you have done that and are willing to share your scripts with me, it would be greatly appreciated.
Topic 1: Python Basics
Slides: 1 Python Basics
Tutorial: Tutorial 1 – First Python Script
Python Script: FirstPythonScript.py
Topic 2: Web Scraping
Slides: 2 Web Scraping
Tutorial: Tutorial 2 – Extracting Data from 10-K
Python Scripts (right click and save as):
CSV file: CompanyList.csv
Topic 3: Introduction to Natural Language Processing
Slides: 3 Intro to Natural Language Processing
Tutorial: Tutorial 3 – Computing TF and TF-IDF
Python Scripts: 5tfidf.py