SciPy 2020
https://www.scipy2020.scipy.org/tutorial-information
Talks
Talks
Tutorials
Tutorials
Learn Python through Data Processing in Pandas
Learn Python through Data Processing in Pandas
https://github.com/chendaniely/scipy-2020-pandas
Deep Learning from Scratch with PyTorch
Deep Learning from Scratch with PyTorch
https://github.com/hugobowne/deep-learning-from-scratch-pytorch
Run it online here:
- https://colab.research.google.com/github/hugobowne/deep-learning-from-scratch-pytorch/blob/master/notebooks/1-Student-deep-learning-from-scratch-pytorch.ipynb
- https://mybinder.org/v2/gh/hugobowne/deep-learning-from-scratch-pytorch/f61063c3ec3aca124fd90b6af604e8e4c7313604?urlpath=lab
Overview...
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import tree
from sklearn.metrics import accuracy_score
# Figures inline and set visualization style
%matplotlib inline
sns.set()
Read in data
df = pd.read_csv('https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/master/data/train.csv')
# View first lines of training data
df.head(n=4)
Check out data types
df.info()
Check out summary statistics
df.describe()
Split our data into train and test sets
from sklearn.model_selection import train_test_split
df_train, df_test, y_train, y_test = train_test_split(
df.drop('Survived', axis=1), df[['Survived']], test_size=0.33, random_state=42, stratify=df[['Survived']])
Make bar plot of target variable
df_train['Survived'] = y_train
sns.countplot(x='Survived', data=df_train);
Make a first baseline and very naive prediction that everybody died and compute accuracy
df_test['Survived'] = 0
pred_diff = y_test['Survived'] - df_test['Survived'].array
accuracy = 1 - sum(pred_diff)/len(pred_diff)
print(accuracy)
Build a decision tree
Build a decision tree
data preparation and cleaning
df['Age'] = df.Age.fillna(df.Age.median())
df['Fare'] = df.Fare.fillna(df.Fare.median())
df.info()
Convert Sex into a numerical feature
df = pd.get_dummies(df, columns=['Sex'], drop_first=True)
df = df[['Sex_male', 'Fare', 'Age','Pclass', 'SibSp','Survived']]
df.head()
train test split
df_train, df_test, y_train, y_test = train_test_split(
df.drop('Survived', axis=1), df[['Survived']], test_size=0.33, random_state=41, stratify=df[['Survived']])
Instantiate model and fit to data
clf = tree.DecisionTreeClassifier(max_depth=3)
clf.fit(df_train, y_train)
Make predictions
Y_pred = clf.predict(df_test)