SciPy 2020

https://www.scipy2020.scipy.org/tutorial-information

Talks


Tutorials

Learn Python through Data Processing in Pandas

https://github.com/chendaniely/scipy-2020-pandas


Deep Learning from Scratch with PyTorch

https://github.com/hugobowne/deep-learning-from-scratch-pytorch

Run it online here:

Overview...

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import tree
from sklearn.metrics import accuracy_score

# Figures inline and set visualization style
%matplotlib inline
sns.set()

Read in data

df = pd.read_csv('https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/master/data/train.csv')

# View first lines of training data
df.head(n=4)

Check out data types

df.info()

Check out summary statistics

df.describe()

Split our data into train and test sets

from sklearn.model_selection import train_test_split
df_train, df_test, y_train, y_test = train_test_split(
    df.drop('Survived', axis=1), df[['Survived']], test_size=0.33, random_state=42, stratify=df[['Survived']])

Make bar plot of target variable

df_train['Survived'] = y_train
sns.countplot(x='Survived', data=df_train);

Make a first baseline and very naive prediction that everybody died and compute accuracy

df_test['Survived'] = 0
pred_diff = y_test['Survived'] - df_test['Survived'].array
accuracy = 1 - sum(pred_diff)/len(pred_diff)
print(accuracy)

Build a decision tree

data preparation and cleaning

df['Age'] = df.Age.fillna(df.Age.median())
df['Fare'] = df.Fare.fillna(df.Fare.median())
df.info()

Convert Sex into a numerical feature

df = pd.get_dummies(df, columns=['Sex'], drop_first=True)
df = df[['Sex_male', 'Fare', 'Age','Pclass', 'SibSp','Survived']]
df.head()

train test split

df_train, df_test, y_train, y_test = train_test_split(
    df.drop('Survived', axis=1), df[['Survived']], test_size=0.33, random_state=41, stratify=df[['Survived']])

Instantiate model and fit to data

clf = tree.DecisionTreeClassifier(max_depth=3)
clf.fit(df_train, y_train)

Make predictions

Y_pred = clf.predict(df_test)