Below you'll find the links to the documentation for the Python libraries that we'll be using throughout the course:
Matplotlib.pyplot (styles, tight-layout, OOP?)
Data sets can be found here: https://github.com/bobg207/Honors_Data_Analysis
Unit 2: Numpy (reason, creating, slicing, stats)
Unit 3: Pandas
Part 1: Intro
Part 2: Loading Data
Part 3: Plotting
Portland Weather Data - Due 10/3
Euro 2012 - Due 10/7
Unit 4: Statistical Analysis
Part 1: Descriptive Statistics
Part 2: Data Visualization and Analysis Presentation
Website Times - Due 10/23
Part 3: Correlation
Linear Regression Function - 10/29
- takes two panda's series as parameters
- returns the slope and y-intercept of the line of best fit
Data/Regression Line Plotting Function - 10/29
- takes 3 panda's series as parameters
- plots the data and regression line
Auto_Data Revisited - 11/8
Part 4: Outliers, Residuals, and Root Mean Squared Error
Explanation of the following are found in the link above. Create a new
notebook with solutions contained within it. Place your functions in the first cell after the imports.
4a - Function for removing outliers
4b - Function for calculating RMSE and R^2
Part 4 Project - use the 2018_MLB_Hitting_Stats.txt
and 2018_MLB_Pitching_Stats_II.txt
files on GitHub
Part 5: Higher Ordered Regression Fits
5a - Update Least Squares function to handle Quadratic Regression
5b - Auto Data Revisited - Submit when complete
5c - Bi-Linear Regression (Moneyball Problem)
5d - 3-D representation of Moneyball data - Submit when complete
Part 6: K-Means Clustering
6a - Function to normalize data and function to find centroids
** might start with this data set:
{'a':[1, 1.5, 5, 8, 1, 9], 'b':[2, 1.8, 8, 8, 0.6, 11]}
6c - Customer Data - Submit when complete
Part 7: K-Nearest Neighbor and the Seattle Housing Data
7a - Create a function to separate a set of data into a training set and test
set. (2/24)
7b - Start on Problems 8a-d from the lecture. (2/26)