Azure

azureml-opendatasets

pip install azureml-opendataset

GFS

from azureml.opendatasets import NoaaGfsWeather

from datetime import datetime

from dateutil.relativedelta import relativedelta


end_date = datetime.today()

start_date = datetime.today() - relativedelta(days=1)

gfs = NoaaGfsWeather(start_date=datetime.datetime(2022, 6, 29, 0, 0), end_date=datetime.datetime(2022, 6, 30, 0, 0))

gfs_df = gfs.to_pandas_dataframe()

Yellox taxi

from azureml.opendatasets import NycTlcYellow

from datetime import datetime

from dateutil import parser


end_date = parser.parse('2018-06-06')

start_date = parser.parse('2018-05-01')

nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)

nyc_tlc_df = nyc_tlc.to_pandas_dataframe()


nyc_tlc_df.info()

Create a new Windows virtual machine

  • Go to https://portal.azure.com/learn.docs.microsoft.com

  • Click 'create a resource'.

  • Search 'Windows Server'

  • Click on 'Select a software plan' and choose '[smalldisk] Windows Server 2016 Datacenter'.

  • Click on 'Resource group' and select learn-f072946c-e335-4f7d-b8cf-b18a7f71b8b8.

  • In Virtual Machine Name enter 'test-vp-vm2'

  • Enter username and password

  • Click the 'Create and attach a new disk'.

  • Name it 'VideoCodecVM_DataDisk_0' and click 'ok'

  • Click 'Create new' under 'Virtual network'.

  • Under Address range put '172.16.0.0/16'

  • Under Subnet name put 'new' and make address '172.16.1.0/24'. Click the boxes next to them and Click ok.

  • Click the 'review and create button' at the bottom. Click 'create'.

Remote Desktop Protocol

  • On the portal go to resource and click 'Connect' in the top left.

  • Download the .rdp file.

  • Right click the file and click 'edit'. Edit the settings you desire such as Display, local resources to share e.g. drives (easy to move files), experience (network connectivity).

  • Share you C: drive. Click 'Save' on the 'General' tab.

  • Connect using you username and login

On the VM

  • Close server manager

  • Click on 'File Explorer' and click on 'This PC'

  • Search 'Computer Management' from the start button and click on 'Disk Management'. Click 'ok' to initialize the dick

  • Right click the Disk and click 'New Simple Volume'. Click on 'next' to format it.

  • Close the client.

Create a Windows data science virtual machine

Jupyter notebook

https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-samples-and-walkthroughs

https://docs.microsoft.com/en-us/azure/machine-learning/

Create a Linux (ubuntu) data science virtual machine

  • + Create a new resource and search Data Science Virtual Machine for Linux (Ubuntu)

  • Click 'Create'

  • Create a Resource Group

  • Name the VM

  • Use standard image and standard size.

  • Enter a username and password (all lowercase in username).

  • Choose DSv2 as the tier.

  • Create

  • Go To Resource and make a copy of the public IP address.

  • Copy the Public IP and go to https://VMIP:8000 or https://VMIP:8000/user/USERNAME/lab

  • When finished click 'Dissociate' then click 'Delete', click 'Delete'

https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro

Create a VM (not tested)

az vm create --name VMname --resource-group RGname --size D96as_v4 --generate-ssh-keys


Azure ML Studio

Create compute resources

Click on Compute

Compute instances -> CPU Standard_DS11_v2

Compute clusters -> CPU Standard_DS11_v2. 2. 2.

Create data

Click on Datasets

from web files:

Click on data -> Explore

Automated ML

+ New Automated ML Run

  • bike-rentals data

  • mslearn-bike-rental

  • rentals

  • bbb-cluster

  • Regression

  • metric - normalized root mean squared error

  • Block all algorithms but RandomForest & LightGBM

  • Exit training job time - 0.25

  • Exit metric score - 0.08

Review the best model

  • Select the algorithm (MaxAbsScaler, LightGBM)

  • Click on 'view all other metrics

  • Metrics -> residuals, predicted_true

  • Explanations

Deploy

  • predict-rentals

  • Predict cycle rentals

  • ACI (Azure Container Instance)

  • Enable authentication

  • Endpoints -> predict-rentals -> Consume. Copy the REST endpoint and primary key

Predict

Notebooks

Create new file

  • Test-Bikes

  • Notebook

  • Overwrite if already exists

  • Collapse file explorer

Add the following

endpoint = 'YOUR_ENDPOINT' #Replace with your endpoint

key = 'YOUR_KEY' #Replace with your key


import json

import requests


#An array of features based on five-day weather forecast

x = [[1,1,2022,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446],

[2,1,2022,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539],

[3,1,2022,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309],

[4,1,2022,1,0,2,1,1,0.2,0.212122,0.590435,0.160296],

[5,1,2022,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869]]


#Convert the array to JSON format

input_json = json.dumps({"data": x})


#Set the content type and authentication for the request

headers = {"Content-Type":"application/json",

"Authorization":"Bearer " + key}


#Send the request

response = requests.post(endpoint, input_json, headers=headers)


#If we got a valid response, display the predictions

if response.status_code == 200:

y = json.loads(response.json())

print("Predictions:")

for i in range(len(x)):

print (" Day: {}. Predicted rentals: {}".format(i+1, max(0, round(y["result"][i]))))

else:

print(response)

  • Save and checkpoint

  • Run cell

Clean up

Endpoints

  • Click model -> delete

Compute

  • Compute instances -> stop

  • Compute cluster -> computer name -> edit -> 0 nodes

  • Azure portal -> resource groups -> resource group name -> delete


Create inference cluster

  • Standard_DS11_v2. Dev-test

  • 2


Designer

  • +

  • Change Draft name

  • Select compute target (choose cluster)

  • Sample datasets

  • Drag Automobile price data (Raw) dataset onto canvas

  • Right click it -> Vizualize -> Dataset Output -> price column

  • Data Transformation -> Drag 'Select Columns in Dataset'. Connect boxes.

  • Click 'Select Columns in Dataset' and click 'Edit column'. 'By name'. Add All but remove 'normalized-losses'

  • Drag 'Clean Missing Data'. Connect boxes and click 'Edit column'. With rules then include 'bore', 'stroke' and 'horsepower'. Cleaning mode - Remove entire row

  • Drag 'Normalize Data'. Connect left dot from box above. Transformation 'MinMax'

  • Submit -> Create new named 'auto-price-training'.

  • Normalize Data -> Outputs + logs. Next to Transformed dataset click on visualize

  • Drag Split Data. Connect boxes. 0.7 fraction. Random seed 123.

  • Model Training -> Drag Train Model. Connect left to right. Label column to price

  • Machine Learning Algorithms -> Linear Regression. connect to Train Model

https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet

  • Model Scoring & Evaluation. Score Model. Connect remaining dots

  • Submit -> Select existing

  • Select Score Model. Outputs + Logs. Data outputs. Click visualize

  • Model Scoring & Evaluation -> Evaluate Model. Connect to top left dot

  • Submit -> Select existing

  • Click on 'Evaluate Model' then Output and logs then visualize

  • Click Create inference pipeline -> Real-time inference pipeline.

  • Rename to 'Predict Auto Price'.

  • Delete Automobile price data (Raw) and add Data Input and Output -> Enter Data Manually

  • Remove price from Select Columns in Dataset

  • Remove evaluate model

  • Insert Python Language -> Execute Python Script

  • Add text into manual entry

symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,width,height,curb-weight,engine-type,num-of-cylinders,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg 3,NaN,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9,111,5000,21,27 3,NaN,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9,111,5000,21,27 1,NaN,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9,154,5000,19,26

Add python script

import pandas as pd


def azureml_main(dataframe1 = None, dataframe2 = None):


scored_results = dataframe1[['Scored Labels']]

scored_results.rename(columns={'Scored Labels':'predicted_price'},

inplace=True)

return scored_results

  • Click on Execute Python Script. Then Result dataset then vizualize

  • Submit. New experiment. Named. predict-auto-price

  • Click Deploy. Attache to inference cluster and click deploy

  • Click on Endpoints and open predict-auto-price. view consume tab. copy REST endpoint and key

  • Open ML stuido in second tab. create Test-Autos. notebook

endpoint = 'YOUR_ENDPOINT' #Replace with your endpoint

key = 'YOUR_KEY' #Replace with your key


import urllib.request

import json

import os


# Prepare the input data

data = {

"Inputs": {

"WebServiceInput0":

[

{

'symboling': 3,

'normalized-losses': None,

'make': "alfa-romero",

'fuel-type': "gas",

'aspiration': "std",

'num-of-doors': "two",

'body-style': "convertible",

'drive-wheels': "rwd",

'engine-location': "front",

'wheel-base': 88.6,

'length': 168.8,

'width': 64.1,

'height': 48.8,

'curb-weight': 2548,

'engine-type': "dohc",

'num-of-cylinders': "four",

'engine-size': 130,

'fuel-system': "mpfi",

'bore': 3.47,

'stroke': 2.68,

'compression-ratio': 9,

'horsepower': 111,

'peak-rpm': 5000,

'city-mpg': 21,

'highway-mpg': 27,

},

],

},

"GlobalParameters": {

}

}

body = str.encode(json.dumps(data))

headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ key)}

req = urllib.request.Request(endpoint, body, headers)


try:

response = urllib.request.urlopen(req)

result = response.read()

json_result = json.loads(result)

y = json_result["Results"]["WebServiceOutput0"][0]["predicted_price"]

print('Predicted price: {:.2f}'.format(y))


except urllib.error.HTTPError as error:

print("The request failed with status code: " + str(error.code))


# Print the headers to help debug the error

print(error.info())

print(json.loads(error.read().decode("utf8", 'ignore')))

SDK

https://ml.azure.com/

https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py

pip install azureml-sdk

You can download the config.json from the Azure portal (in the Machine Learning instance)

Create resources as

from azureml.core import Workspace

ws = Workspace.create(name='aml-workspace',

subscription_id='123456-abc-123...',

resource_group='aml-resources',

create_resource_group=True,

location='eastus'

)

or

az ml workspace create -w 'aml-workspace' -g 'aml-resources'

Connect to the config as:

from azureml.core import Workspace


ws = Workspace.from_config()

See compute targets:

for compute_name in ws.compute_targets:

compute = ws.compute_targets[compute_name]

print(compute.name, ":", compute.type)

Install az cli on linuz:

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Add the ml extension

az extension add -n azure-cli-ml

https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-setup-vscode-extension

Experiment run context

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-track-experiments

from azureml.core import Experiment


# create an experiment variable

experiment = Experiment(workspace = ws, name = "my-experiment")


# start the experiment

run = experiment.start_logging()


# experiment code goes here


# end the experiment

run.complete()

Log number of observation in a file:

from azureml.core import Experiment

import pandas as pd


# Create an Azure ML experiment in your workspace

experiment = Experiment(workspace = ws, name = 'my-experiment')


# Start logging data from the experiment

run = experiment.start_logging()


# load the dataset and count the rows

data = pd.read_csv('data.csv')

row_count = (len(data))


# Log the row count

run.log('observations', row_count)


# Complete the experiment

run.complete()

Retrieve logs

from azureml.widgets import RunDetails


RunDetails(run).show()

or

import json


# Get logged metrics

metrics = run.get_metrics()

print(json.dumps(metrics, indent=2))

Add files to the upload path

run.upload_file(name='outputs/sample.csv', path_or_stream='./sample.csv')

Retrieve these files as:

import json


files = run.get_file_names()

print(json.dumps(files, indent=2))

Experiment script

from azureml.core import Run

import pandas as pd

import matplotlib.pyplot as plt

import os


# Get the experiment run context

run = Run.get_context()


# load the diabetes dataset

data = pd.read_csv('data.csv')


# Count the rows and log the result

row_count = (len(data))

run.log('observations', row_count)


# Save a sample of the data

os.makedirs('outputs', exist_ok=True)

data.sample(100).to_csv("outputs/sample.csv", index=False, header=True)


# Complete the run

run.complete()

Create an script configuration. For example, can have an experiment_files which also contains data:

from azureml.core import Experiment, ScriptRunConfig


# Create a script config

script_config = ScriptRunConfig(source_directory=experiment_folder,

script='experiment.py')


# submit the experiment

experiment = Experiment(workspace = ws, name = 'my-experiment')

run = experiment.submit(config=script_config)

run.wait_for_completion(show_output=True)

Estimator

Estimators are encapsure run configuration and script configuration in a single object

A script to train a model

from azureml.core import Run

import pandas as pd

import numpy as np

import joblib

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression


# Get the experiment run context

run = Run.get_context()


# Prepare the dataset

diabetes = pd.read_csv('data.csv')

X, y = data[['Feature1','Feature2','Feature3']].values, data['Label'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)


# Train a logistic regression model

reg = 0.1

model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)


# calculate accuracy

y_hat = model.predict(X_test)

acc = np.average(y_hat == y_test)

run.log('Accuracy', np.float(acc))


# Save the trained model

os.makedirs('outputs', exist_ok=True)

joblib.dump(value=model, filename='outputs/model.pkl')


run.complete()

Use a generic Estimator class to define a run configuration for a training script like this:

from azureml.train.estimator import Estimator

from azureml.core import Experiment


# Create an estimator

estimator = Estimator(source_directory='experiment_folder',

entry_script='training_script.py',

compute_target='local',

conda_packages=['scikit-learn']

)


# Create and run an experiment

experiment = Experiment(workspace = ws, name = 'training_experiment')

run = experiment.submit(config=estimator)

Framework specific estimators simplify configuration. For example, the SKLearn contains it's dependencies. (https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets)

from azureml.train.sklearn import SKLearn

from azureml.core import Experiment


# Create an estimator

estimator = SKLearn(source_directory='experiment_folder',

entry_script='training_script.py'

compute_target='local'

)


# Create and run an experiment

experiment = Experiment(workspace = ws, name = 'training_experiment')

run = experiment.submit(config=estimator)

Script parameters

Use parameters to set variables in the script. Read the argument reg:

from azureml.core import Run

import argparse

import pandas as pd

import numpy as np

import joblib

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression


# Get the experiment run context

run = Run.get_context()


# Set regularization hyperparameter

parser = argparse.ArgumentParser()

parser.add_argument('--reg_rate', type=float, dest='reg', default=0.01)

args = parser.parse_args()

reg = args.reg


# Prepare the dataset

diabetes = pd.read_csv('data.csv')

X, y = data[['Feature1','Feature2','Feature3']].values, data['Label'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)


# Train a logistic regression model

model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)


# calculate accuracy

y_hat = model.predict(X_test)

acc = np.average(y_hat == y_test)

run.log('Accuracy', np.float(acc))


# Save the trained model

os.makedirs('outputs', exist_ok=True)

joblib.dump(value=model, filename='outputs/model.pkl')


run.complete()

Use in an estimator by using a dictionary to change values in a script

from azureml.train.sklearn import SKLearn

from azureml.core import Experiment


# Create an estimator

estimator = SKLearn(source_directory='experiment_folder',

entry_script='training_script.py',

script_params = {'--reg_rate': 0.1},

compute_target='local'

)


# Create and run an experiment

experiment = Experiment(workspace = ws, name = 'training_experiment')

run = experiment.submit(config=estimator)

Register models

You can retrieve files using:

# "run" is a reference to a completed experiment run


# List the files generated by the experiment

for file in run.get_file_names():

print(file)


# Download a named file

run.download_file(name='outputs/model.pkl', output_file_path='model.pkl')

To register a model from a local file you can do:

from azureml.core import Model


model = Model.register(workspace=ws,

model_name='classification_model',

model_path='model.pkl', # local path

description='A classification model',

tags={'dept': 'sales'},

model_framework=Model.Framework.SCIKITLEARN,

model_framework_version='0.20.3')

or with a remote experiment:

run.register_model( model_name='classification_model',

model_path='outputs/model.pkl', # run outputs path

description='A classification model',

tags={'dept': 'sales'},

model_framework=Model.Framework.SCIKITLEARN,

model_framework_version='0.20.3')

See registered models by doing:

from azureml.core import Model


for model in Model.list(ws):

# Get model name and auto-generated version

print(model.name, 'version:', model.version)

Azure ML

https://azure.microsoft.com/en-us/services/machine-learning/

Signed up for the free tier of Azure ML

1st experiment

https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup

  • Click 'create a resource'.

  • search Machine Learning and click Enter

  • Name the workspace 'docs-ws'

  • Choose the 'Free tier' subscription. and create a resource group called 'docs-aml'

  • Choose 'basic' for the work-space edition

  • Click the 'review and create button' at the bottom. Click 'create'.

  • select 'Go to resource' button.


  • Sign in to https://ml.azure.com/ and choose the same subscription and resource group

  • Select Notebooks on the left, Open the Samples folder, Open the Python folder. Open the folder with a version number on it (current version of Python SDK (software development kit))

  • Select the "..." at the right of the tutorials folder and then select Clone. Select your folder to clone the tutorials folder there. Click 'clone'.

  • Select the tutorial-1st-experiment-sdk-train.ipynb file in your tutorials folder (User files -> rbell -> tutorials ->)

  • Click + New VM. Name it 'docs-am-vm'


The notebook is stored at https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-train

from azureml.core import Workspace

ws = Workspace.from_config()

# go to https://microsoft.com/devicelogin and sign in

create an experiment in your workspace

from azureml.core import Experiment

experiment = Experiment(workspace=ws, name="diabetes-experiment")

Read in data

from sklearn.datasets import load_diabetes

from sklearn.model_selection import train_test_split


X, y = load_diabetes(return_X_y = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=66)

Build a script that trains ridge models in a loop through different hyperparameter alpha values

from sklearn.linear_model import Ridge

from sklearn.metrics import mean_squared_error

from sklearn.externals import joblib

import math


alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]


for alpha in alphas:

run = experiment.start_logging()

run.log("alpha_value", alpha)

model = Ridge(alpha=alpha)

model.fit(X=X_train, y=y_train)

y_pred = model.predict(X=X_test)

rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))

run.log("rmse", rmse)

model_name = "model_alpha_" + str(alpha) + ".pkl"

filename = "outputs/" + model_name

joblib.dump(value=model, filename=filename)

run.upload_file(name=model_name, path_or_stream=filename)

run.complete()

After the training has completed, call the experiment variable to fetch a link to the experiment in the portal.

experiment

Click on the link to the report

Clicking on a run number link in the RUN NUMBER column takes you to the page for each individual run. You can see the pickle files in 'Outputs'.

Get the best run

minimum_rmse_runid = None

minimum_rmse = None


for run in experiment.get_runs():

run_metrics = run.get_metrics()

run_details = run.get_details()

# each logged metric becomes a key in this returned dict

run_rmse = run_metrics["rmse"]

run_id = run_details["runId"]

if minimum_rmse is None:

minimum_rmse = run_rmse

minimum_rmse_runid = run_id

else:

if run_rmse < minimum_rmse:

minimum_rmse = run_rmse

minimum_rmse_runid = run_id


print("Best run_id: " + minimum_rmse_runid)

print("Best run_id rmse: " + str(minimum_rmse))

see all the files available for download from this run

from azureml.core import Run

best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)

print(best_run.get_file_names())

Download this model to the current directory

best_run.download_file(name="model_alpha_0.1.pkl")

To kill the VM go to 'Compute' on the left and stop the VM

Go to resources and delete the 'docs-ws' resource.

Train and Deploy a model

https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml

https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/img-classification-part1-training.ipynb

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt

import azureml.core from azureml.core

import Workspace

# check core SDK version number print("Azure ML SDK Version: ", azureml.core.VERSION)


# load workspace configuration from the config.json file in the current folder.

ws = Workspace.from_config()

print(ws.name, ws.location, ws.resource_group, sep='\t')


experiment_name = 'sklearn-mnist'


from azureml.core import Experiment

exp = Experiment(workspace=ws, name=experiment_name)


from azureml.core.compute import AmlCompute

from azureml.core.compute import ComputeTarget

import os


Create or Attach existing compute resource

# choose a name for your cluster

compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpu-cluster")

compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)

compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)


# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6

vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")



if compute_name in ws.compute_targets:

compute_target = ws.compute_targets[compute_name]

if compute_target and type(compute_target) is AmlCompute:

print('found compute target. just use it. ' + compute_name)

else:

print('creating a new compute target...')

provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,

min_nodes = compute_min_nodes,

max_nodes = compute_max_nodes)


# create the cluster

compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)

# can poll for a minimum number of nodes and for a specific timeout.

# if no min node count is provided it will use the scale settings for the cluster

compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# For a more detailed view of current AmlCompute status, use get_status()

print(compute_target.get_status().serialize())

Download MNIST

import urllib.request


data_folder = os.path.join(os.getcwd(), 'data')

os.makedirs(data_folder, exist_ok=True)


urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))

urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))

urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))

urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))

Display some numbers

# make sure utils.py is in the same directory as this code

from utils import load_data


# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.

X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0

X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0

y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)

y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)


# now let's show some randomly chosen images from the traininng set.

count = 0

sample_size = 30

plt.figure(figsize = (16, 6))

for i in np.random.permutation(X_train.shape[0])[:sample_size]:

count = count + 1

plt.subplot(1, sample_size, count)

plt.axhline('')

plt.axvline('')

plt.text(x=10, y=-10, s=y_train[i], fontsize=18)

plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)

plt.show()

Create a FileDataset (https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-create-register-datasets)

from azureml.core.dataset import Dataset


web_paths = [

'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',

'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',

'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',

'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'

]

dataset = Dataset.File.from_files(path = web_paths)

Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.

dataset = dataset.register(workspace = ws,

name = 'mnist dataset',

description='training and test dataset',

create_new_version=True)


# list the files referenced by dataset

dataset.to_path()

Train on a remote cluster

import os

script_folder = os.path.join(os.getcwd(), "sklearn-mnist")

os.makedirs(script_folder, exist_ok=True)

Create train.py

%%writefile $script_folder/train.py


import argparse

import os

import numpy as np

import glob


from sklearn.linear_model import LogisticRegression

from sklearn.externals import joblib


from azureml.core import Run

from utils import load_data


# let user feed in 2 parameters, the dataset to mount or download, and the regularization rate of the logistic regression model

parser = argparse.ArgumentParser()

parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')

parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')

args = parser.parse_args()


data_folder = args.data_folder

print('Data folder:', data_folder)


# load train and test set into numpy arrays

# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.

X_train = load_data(glob.glob(os.path.join(data_folder, '**/train-images-idx3-ubyte.gz'), recursive=True)[0], False) / 255.0

X_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-images-idx3-ubyte.gz'), recursive=True)[0], False) / 255.0

y_train = load_data(glob.glob(os.path.join(data_folder, '**/train-labels-idx1-ubyte.gz'), recursive=True)[0], True).reshape(-1)

y_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-labels-idx1-ubyte.gz'), recursive=True)[0], True).reshape(-1)


print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\n')


# get hold of the current run

run = Run.get_context()


print('Train a logistic regression model with regularization rate of', args.reg)

clf = LogisticRegression(C=1.0/args.reg, solver="liblinear", multi_class="auto", random_state=42)

clf.fit(X_train, y_train)


print('Predict the test set')

y_hat = clf.predict(X_test)


# calculate accuracy on the prediction

acc = np.average(y_hat == y_test)

print('Accuracy is', acc)


run.log('regularization rate', np.float(args.reg))

run.log('accuracy', np.float(acc))


os.makedirs('outputs', exist_ok=True)

# note file saved in the outputs folder is automatically uploaded into experiment record

joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')

Copy utils.py to the remote cluster

import shutil

shutil.copy('utils.py', script_folder)

Create an estimator which submits the run

from azureml.core.environment import Environment

from azureml.core.conda_dependencies import CondaDependencies


# to install required packages

env = Environment('my_env')

cd = CondaDependencies.create(pip_packages=['azureml-sdk','scikit-learn','azureml-dataprep[pandas,fuse]>=1.1.14'])


env.python.conda_dependencies = cd


from azureml.train.sklearn import SKLearn


script_params = {

# to mount files referenced by mnist dataset

'--data-folder': dataset.as_named_input('mnist').as_mount(),

'--regularization': 0.5

}


est = SKLearn(source_directory=script_folder,

script_params=script_params,

compute_target=compute_target,

environment_definition=env,

entry_script='train.py')

Submit the job to the cluster

run = exp.submit(config=est)

run

Here is what is happening:

  • Image creation: A Docker image is created matching the Python environment specified by the estimator. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes about 5 minutes.

  • Scaling

  • Running

  • Post-Processing: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.

Watch the progress of the run with a Jupyter widget

from azureml.widgets import RunDetails

RunDetails(run).show()

Get log results upon completion

# specify show_output to True for a verbose log

run.wait_for_completion(show_output=True)

Display run results

print(run.get_metrics())

The last step in the training script wrote the file outputs/sklearn_mnist_model.pkl in a directory named outputs in the VM of the cluster where the job is executed.

See files associated with that run

print(run.get_file_names())

Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model.

# register model

model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')

print(model.name, model.id, model.version, sep='\t')

https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-deploy-models-with-aml

Deploy the model as a web service in Azure Container Instances. A web service is an image, in this case a Docker image. It encapsulates the scoring logic and the model itself.

More info on deploying here: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where

https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/img-classification-part2-deploy.ipynb

Import packages

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt

import azureml.core


# display the core SDK version number

print("Azure ML SDK Version: ", azureml.core.VERSION)

Retrieve the model

from azureml.core import Workspace

from azureml.core.model import Model

import os

ws = Workspace.from_config()

model=Model(ws, 'sklearn_mnist')


model.download(target_dir=os.getcwd(), exist_ok=True)


# verify the downloaded model file

file_path = os.path.join(os.getcwd(), "sklearn_mnist_model.pkl")


os.stat(file_path)

Before deploying, make sure your model is working locally by:

  • Loading test data

  • Predicting test data

  • Examining the confusion matrix

Load the test data from the ./data/ directory created during the training tutorial.

from utils import load_data

import os


data_folder = os.path.join(os.getcwd(), 'data')

# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster

X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0

y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)

Feed the test dataset to the model to get predictions.

import pickle

#from sklearn.externals import joblib

import joblib


clf = joblib.load( os.path.join(os.getcwd(), 'sklearn_mnist_model.pkl'))

y_hat = clf.predict(X_test)

Generate a confusion matrix to see how many samples from the test set are classified correctly

from sklearn.metrics import confusion_matrix


conf_mx = confusion_matrix(y_test, y_hat)

print(conf_mx)

print('Overall accuracy:', np.average(y_hat == y_test))

Display the confusion matrix as a graph (The color in each grid represents the error rate)

# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized

row_sums = conf_mx.sum(axis=1, keepdims=True)

norm_conf_mx = conf_mx / row_sums

np.fill_diagonal(norm_conf_mx, 0)


fig = plt.figure(figsize=(8,5))

ax = fig.add_subplot(111)

cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)

ticks = np.arange(0, 10, 1)

ax.set_xticks(ticks)

ax.set_yticks(ticks)

ax.set_xticklabels(ticks)

ax.set_yticklabels(ticks)

fig.colorbar(cax)

plt.ylabel('true labels', fontsize=14)

plt.xlabel('predicted values', fontsize=14)

plt.savefig('conf.png')

plt.show()

Deploy as a web service

Deploy the model as a web service hosted in Container Instances

To build the correct environment for Container Instances, provide the following components:

  • A scoring script to show how to use the model.

  • An environment file to show what packages need to be installed.

  • A configuration file to build the container instance.

  • The model you trained previously.

Create the scoring script, called score.py. The web service call uses this script to show how to use the model.

u must include two required functions into the scoring script:

  • The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.

  • The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

%%writefile score.py

import json

import numpy as np

import os

import pickle

from sklearn.externals import joblib

from sklearn.linear_model import LogisticRegression


from azureml.core.model import Model


def init():

global model

# retrieve the path to the model file using the model name

model_path = Model.get_model_path('sklearn_mnist')

model = joblib.load(model_path)


def run(raw_data):

data = np.array(json.loads(raw_data)['data'])

# make prediction

y_hat = model.predict(data)

# you can return any data type as long as it is JSON-serializable

return y_hat.tolist()

Create environment file

from azureml.core.conda_dependencies import CondaDependencies


myenv = CondaDependencies()

myenv.add_conda_package("scikit-learn")


# Write

with open("myenv.yml","w") as f:

f.write(myenv.serialize_to_string())


# Read to check

with open("myenv.yml","r") as f:

print(f.read())

Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container

from azureml.core.webservice import AciWebservice


aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,

memory_gb=1,

tags={"data": "MNIST", "method" : "sklearn"},

description='Predict MNIST with sklearn')

Configure the image and deploy. The following code goes through these steps:

  1. Build an image using:

    • The scoring file (score.py)

    • The environment file (myenv.yml)

    • The model file

  2. Register that image under the workspace.

  3. Send the image to the ACI container.

  4. Start up a container in ACI using the image.

  5. Get the web service HTTP endpoint.

%%time

from azureml.core.webservice import Webservice

from azureml.core.model import InferenceConfig


inference_config = InferenceConfig(runtime= "python",

entry_script="score.py",

conda_file="myenv.yml")


service = Model.deploy(workspace=ws,

name='sklearn-mnist-svc',

models=[model],

inference_config=inference_config,

deployment_config=aciconfig)


service.wait_for_deployment(show_output=True)

Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.

print(service.scoring_uri)

Test deployed service

Send the data as a JSON array to the web service hosted in ACI. Use the SDK's run API to invoke the service. You can also make raw calls using any HTTP tool such as curl. Print the returned predictions and plot them along with the input images.

Red font and inverse image (white on black) is used to highlight the misclassified samples.

import json


# find 30 random samples from test set

n = 30

sample_indices = np.random.permutation(X_test.shape[0])[0:n]


test_samples = json.dumps({"data": X_test[sample_indices].tolist()})

test_samples = bytes(test_samples, encoding='utf8')


# predict using the deployed model

result = service.run(input_data=test_samples)


# compare actual value vs. the predicted values:

i = 0

plt.figure(figsize = (20, 1))


for s in sample_indices:

plt.subplot(1, n, i + 1)

plt.axhline('')

plt.axvline('')

# use different color for misclassified sample

font_color = 'red' if y_test[s] != result[i] else 'black'

clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys

plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)

plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)

i = i + 1

plt.show()

You can also send raw HTTP request to test the web service.

import requests


# send a random row from the test set to score

random_index = np.random.randint(0, len(X_test)-1)

input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"


headers = {'Content-Type':'application/json'}


# for AKS deployment you'd need to the service key in the header as well

# api_key = service.get_key()

# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}


resp = requests.post(service.scoring_uri, input_data, headers=headers)


print("POST to url", service.scoring_uri)

#print("input data:", input_data)

print("label:", y_test[random_index])

print("prediction:", resp.text)

delete only the ACI deployment using this API call

service.delete()

AutoML on a regression problem

https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models

https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/regression-automated-ml.ipynb

Import the necessary packages

from azureml.opendatasets import NycTlcGreen

import pandas as pd from datetime

import datetime from

dateutil.relativedelta import relativedelta

fetch one month at a time

green_taxi_df = pd.DataFrame([])

start = datetime.strptime("1/1/2015","%m/%d/%Y")

end = datetime.strptime("1/31/2015","%m/%d/%Y")


for sample_month in range(12):

temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \

.to_pandas_dataframe()

green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))


green_taxi_df.head(10)

create various time-based features and use the apply() function on the dataframe to iteratively apply the build_time_features() function to each row in the taxi data.

def build_time_features(vector):

pickup_datetime = vector[0]

month_num = pickup_datetime.month

day_of_month = pickup_datetime.day

day_of_week = pickup_datetime.weekday()

hour_of_day = pickup_datetime.hour

return pd.Series((month_num, day_of_month, day_of_week, hour_of_day))


green_taxi_df[["month_num", "day_of_month","day_of_week", "hour_of_day"]] = green_taxi_df[["lpepPickupDatetime"]].apply(build_time_features, axis=1)

green_taxi_df.head(10)

Remove some of the columns that you won't need for training or additional feature building

columns_to_remove = ["lpepPickupDatetime", "lpepDropoffDatetime", "puLocationId", "doLocationId", "extra", "mtaTax",

"improvementSurcharge", "tollsAmount", "ehailFee", "tripType", "rateCodeID",

"storeAndFwdFlag", "paymentType", "fareAmount", "tipAmount"

]

for col in columns_to_remove:

green_taxi_df.pop(col)

green_taxi_df.head(5)

See summary stats

green_taxi_df.describe()

there are several fields that have outliers or values that will reduce model accuracy. Filter the lat/long fields to be within the bounds of the Manhattan area. filter the `tripDistance` field to be greater than zero but less than 31 miles. totalAmount > 0. passengerCount > 0.

final_df = green_taxi_df.query("pickupLatitude>=40.53 and pickupLatitude<=40.88")

final_df = final_df.query("pickupLongitude>=-74.09 and pickupLongitude<=-73.72")

final_df = final_df.query("tripDistance>=0.25 and tripDistance<31")

final_df = final_df.query("passengerCount>0 and totalAmount>0")


columns_to_remove_for_training = ["pickupLongitude", "pickupLatitude", "dropoffLongitude", "dropoffLatitude"]

for col in columns_to_remove_for_training:

final_df.pop(col)

Call describe

final_df.describe()

Configure work space

from azureml.core.workspace import Workspace

ws = Workspace.from_config()

Split the data into train and test

from sklearn.model_selection import train_test_split


y_df = final_df.pop("totalAmount")

x_df = final_df


x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)

Automatically train a model

  1. Define settings for the experiment run. Attach your training data to the configuration, and modify settings that control the training process.

  2. Submit the experiment for model tuning. After submitting the experiment, the process iterates through different machine learning algorithms and hyperparameter settings, adhering to your defined constraints. It chooses the best-fit model by optimizing an accuracy metric.

Training settings can be found here https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train

|Property| Value in this tutorial |Description|

|----|----|---|

|**iteration_timeout_minutes**|2|Time limit in minutes for each iteration. Reduce this value to decrease total runtime.|

|**iterations**|20|Number of iterations. In each iteration, a new machine learning model is trained with your data. This is the primary value that affects total run time.|

|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model will be chosen based on this metric.|

|**preprocess**| True | By using **True**, the experiment can preprocess the input data (handling missing data, converting text to numeric, etc.)|

|**verbosity**| logging.INFO | Controls the level of logging.|

|**n_cross_validations**|5|Number of cross-validation splits to perform when validation data is not specified.|

import logging


automl_settings = {

"iteration_timeout_minutes": 2,

"iterations": 20,

"primary_metric": 'spearman_correlation',

"preprocess": True,

"verbosity": logging.INFO,

"n_cross_validations": 5

}

This is a regression task. See full inputs here: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings

from azureml.train.automl import AutoMLConfig


automl_config = AutoMLConfig(task='regression',

debug_log='automated_ml_errors.log',

X=x_train.values,

y=y_train.values.flatten(),

**automl_settings)

Train the regression model

Create an experiment object in your workspace. Pass the defined automl_config object to the experiment, and set the output to True to view progress during the run.

from azureml.core.experiment import Experiment

experiment = Experiment(ws, "taxi-experiment")

local_run = experiment.submit(automl_config, show_output=True)

Gives status on outliers and cardinality.

https://pkghosh.wordpress.com/2017/10/09/combating-high-cardinality-features-in-supervised-machine-learning/

Explore the output in a widget https://docs.microsoft.com/en-us/python/api/azureml-widgets/azureml.widgets?view=azure-ml-py

from azureml.widgets import RunDetails

RunDetails(local_run).show()

Retrieve the best model

best_run, fitted_model = local_run.get_output()

print(best_run)

print(fitted_model)

Use the best model to run predictions on the test data set to predict taxi fares.

y_predict = fitted_model.predict(x_test.values)

print(y_predict[:10])

Calculate the root mean squared error of the results

from sklearn.metrics import mean_squared_error

from math import sqrt


y_actual = y_test.values.flatten().tolist()

rmse = sqrt(mean_squared_error(y_actual, y_predict))

rmse

Calculate mean absolute percent error (MAPE) by using the full y_actual and y_predict data sets

sum_actuals = sum_errors = 0


for actual_val, predict_val in zip(y_actual, y_predict):

abs_error = actual_val - predict_val

if abs_error < 0:

abs_error = abs_error * -1


sum_errors = sum_errors + abs_error

sum_actuals = sum_actuals + actual_val


mean_abs_percent_error = sum_errors / sum_actuals

print("Model MAPE:")

print(mean_abs_percent_error)

print()

print("Model Accuracy:")

print(1 - mean_abs_percent_error)

Azure Kubernetes Service (AKS)

Notes taken from https://zero-to-jupyterhub.readthedocs.io/en/latest/ and https://docs.dask.org/en/latest/setup/kubernetes-helm.html

Deploy and customize your own JupyterHub on a cloud.

Some definitions:

More detail into the lingo here - https://zero-to-jupyterhub.readthedocs.io/en/latest/reference/tools.html#tools

Step Zero: your Kubernetes cluster

During the process of setting up JupyterHub, you’ll be creating some files for configuration purposes. It may be helpful to create a folder for your JuypterHub deployment to keep track of these files.

Create a Kubernetes cluster either through the Azure portal website, or using the Azure command line tools

Test using the portal

  1. Sign into Azure

  2. Create a resource -> Containers -> Kubernetes Service

  3. Create a resource group -> name is 'k8s'

  4. Enter a name for the Kubernetes cluster 'myAKSCluster'

  5. Try Kubernetes version 1.14.8

  6. Select a VM node size (DS2 v2)

  7. Select a Node count of 1

  8. Click 'Next : Scale'

  9. Click 'Next : Authentication'

  10. Click 'Review + create'. Got 'Failed to create a service principal. You can use an existing service principal or try again later.' error https://docs.microsoft.com/en-us/azure/aks/troubleshooting#im-receiving-errors-that-my-service-principal-was-not-found-when-i-try-to-create-a-new-cluster-without-passing-in-an-existing-one

Test using the CLI

1. Sign into Azure

2. Create a directory as cannot create apps in the main directory

Go to Azure Active Directory -> Bottom right 'Create Directory' and name it 'DIRECTORYNAME'. It will create 'DIRECTORYNAME.onmicrosoft.com'.

Move the Free subscription to this directory by going to subscriptions -> Click on subscription -> Change directory. (can take up to an hour to change ownership).

3. Click on the cloud shell button '>_'

4. Select 'bash' and create a storage.

4. a) Install the CLI locally https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest

Run PowerShell as an administrator and run

Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'

4. b) Open a terminal locally and type 'az login' (had to change default browser to chrome). See subscriptions by doing az account list --refresh --output table

5. Choose a subscription as az account set --subscription "NAME"

6. Create a resource group as az group create --name=RG_NAME --location="East US" --output table

7. Switch to the Cloud Shell. Make a folder of the name of your cluster (no _ in name or cluster)

mkdir CLUSTER_NAME

cd CLUSTER_NAME

ssh-keygen -f ssh-key-CLUSTER_NAME ! Leave empty for no password

8. Create a AKS Cluster (on the cloud shell).

az aks create --name CLUSTER_NAME \

--resource-group RG_NAME \

--ssh-key-value ssh-key-CLUSTER_NAME.pub \

--node-count 2 \

--node-vm-size Standard_DS2_v2 \

--output table

This should take a few minutes. When complete it shows various information.

9. a) Install kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-windows

az aks install-cli

Add C:\Users\USER\.azure-kubectl to the path. Either do it in the environmental variable or run

set PATH=%PATH%;C:\Users\USER\.azure-kubectl

10. Get credentials

az aks get-credentials \

--name CLUSTER_NAME \

--resource-group RG_NAME \

--output table

11. Check the cluster is working

kubectl get node

Should show two running nodes, their k8s version and a status of 'Ready'.

Step One: Setting up Helm

https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub/setup-helm.html

Helm charts works as a templating engine to populate multiple yaml files. It then runs kubectl apply. Helm has two parts: a client (helm) and a server (tiller). Tiller runs inside of your Kubernetes cluster as a pod in the kube-system namespace. Tiller manages both, the releases (installations) and revisions (versions) of charts deployed on the cluster. When you run helm commands, your local Helm client sends instructions to tiller in the cluster that in turn make the requested changes.

Tiller will be present in the kubernetes cluster and the helm client talks to it for deploying applications using helm charts.

1. Download and install helm locally https://helm.sh/docs/intro/install/

You may want to use https://chocolatey.org/install if using windows (https://helm.sh/docs/intro/install/#from-chocolatey-windows).

I think the rest is redundent with helm3...

2. Set up a ServiceAccount for use by tiller. (diverge from the docs here)

Create a YAML file called helm-rbac.yml

apiVersion: v1

kind: ServiceAccount

metadata:

name: tiller

namespace: kube-system

---

apiVersion: rbac.authorization.k8s.io/v1

kind: ClusterRoleBinding

metadata:

name: tiller

roleRef:

apiGroup: rbac.authorization.k8s.io

kind: ClusterRole

name: cluster-admin

subjects:

- kind: ServiceAccount

name: tiller

namespace: kube-system

Then run kubectl apply -f helm-rbac.yaml

3. Initialize helm and tiller:

helm init --service-account tiller --wait

Foot notes

https://pangeo.io/setup_guides/cloud.html

https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads

https://docs.microsoft.com/en-us/partner-center/set-an-azure-spending-budget-for-your-customers

https://devopscube.com/install-configure-helm-kubernetes/

https://docs.microsoft.com/en-us/azure/aks/kubernetes-helm

https://v3.helm.sh/docs/intro/quickstart/

https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal

K8s with Dask on Windows

0a. Setup a subscription in the Azure portal.

0b. Install the Azure Comand Line on you local machine https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?view=azure-cli-latest

0c. Install kubectl on your local machine az aks install-cli (https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster)

0d. Add C:\Users\131416\.azure-kubectl to PATH by going to Control-Panel->System->Advanced->Environment Variables and add new. Open a new power shell.

0e. Install Chocolatey https://chocolatey.org/install

0f. Install helm https://helm.sh/docs/intro/install/

1. On local machine open PowerShell as an administrator and type az login

2. See subscriptions by doing az account list --refresh --output table

3. Choose a subscription as az account set --subscription "NAME"

4.0. See resource groups as az group list --output table

4. Create a resource group by doing az group create --name=RESOURCEGROUPNAME --location="LOCATION" --output table

See a list of locations here e.g. East US

5a. Go on the Azure Portal and open the Cloud Shell.

5b. Create a directory for the name of your cluster. Go into the cluster and generate keys.

mkdir CLUSTERNAME

cd CLUSTERNAME

ssh-keygen -f ssh-key-CLUSTER_NAME ! Leave empty for no passphrase

5c. Copy the public key to your local machine... TODO

6. Create an Azure Kubernetes Service on the Cloud Shell (takes a few minutes).

az aks create --name CLUSTER_NAME \

--resource-group RG_NAME \

--ssh-key-value ssh-key-CLUSTER_NAME.pub \

--node-count 2 \

--node-vm-size Standard_DS2_v2 \

--output table

7. Go back to local machine and get credentials of the AKS

az aks get-credentials \

--name CLUSTER_NAME \

--resource-group RG_NAME \

--output table

8. Check the cluster is working

kubectl get node

9. launch a Dask scheduler, several workers, and an optional Jupyter Notebook server on a Kubernetes

helm repo add dask https://helm.dask.org/

helm repo update

helm install dask/dask --generate-name

10. Check status by running

kubectl get pods

kubectl get services

11. After running kubectl get services you will see an EXTERNAL-IP.

To access Jupyter Lab: copy the EXTERNAL-IP value for the 'X-jupyter' NAME into a browser and use password 'dask'.

To access the Dashboard: copy the EXTERNAL-IP value for the 'X-scheduler' NAME into a browser.

12. Create a new notebook and run:

from dask.distributed import Client, config

client = Client()


import distributed

import dask.array as da


array = da.ones((1000, 1000, 1000), chunks=(100, 100, 10))

print(array.mean().compute())

20. Delete the cluster

az aks delete --resource-group RG_NAME --name CLUSTER_NAME --no-wait

21. Delete the resource group

az group delete --name RG_NAME