Azure
azureml-opendatasets
pip install azureml-opendataset
GFS
from azureml.opendatasets import NoaaGfsWeather
from datetime import datetime
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(days=1)
gfs = NoaaGfsWeather(start_date=datetime.datetime(2022, 6, 29, 0, 0), end_date=datetime.datetime(2022, 6, 30, 0, 0))
gfs_df = gfs.to_pandas_dataframe()
Yellox taxi
from azureml.opendatasets import NycTlcYellow
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2018-06-06')
start_date = parser.parse('2018-05-01')
nyc_tlc = NycTlcYellow(start_date=start_date, end_date=end_date)
nyc_tlc_df = nyc_tlc.to_pandas_dataframe()
nyc_tlc_df.info()
Create a new Windows virtual machine
Click 'create a resource'.
Search 'Windows Server'
Click on 'Select a software plan' and choose '[smalldisk] Windows Server 2016 Datacenter'.
Click on 'Resource group' and select learn-f072946c-e335-4f7d-b8cf-b18a7f71b8b8.
In Virtual Machine Name enter 'test-vp-vm2'
Enter username and password
Click the 'Create and attach a new disk'.
Name it 'VideoCodecVM_DataDisk_0' and click 'ok'
Click 'Create new' under 'Virtual network'.
Under Address range put '172.16.0.0/16'
Under Subnet name put 'new' and make address '172.16.1.0/24'. Click the boxes next to them and Click ok.
Click the 'review and create button' at the bottom. Click 'create'.
Remote Desktop Protocol
On the portal go to resource and click 'Connect' in the top left.
Download the .rdp file.
Right click the file and click 'edit'. Edit the settings you desire such as Display, local resources to share e.g. drives (easy to move files), experience (network connectivity).
Share you C: drive. Click 'Save' on the 'General' tab.
Connect using you username and login
On the VM
Close server manager
Click on 'File Explorer' and click on 'This PC'
Search 'Computer Management' from the start button and click on 'Disk Management'. Click 'ok' to initialize the dick
Right click the Disk and click 'New Simple Volume'. Click on 'next' to format it.
Close the client.
Create a Windows data science virtual machine
Click 'create a resource'.
go to https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/ and select Data Science Virtual Machine – Windows 2016 and click 'get it now'. Click 'create'.
Choose the resource group
Name the VM 'Win2016'
Choose DSv2 as the tier.
Enter a username and password.
Click 'Create new' under 'Virtual network'.
Under Address range put '172.16.0.0/16'
Under Subnet name put 'new' and make address '172.16.1.0/24'. Click the boxes next to them and Click ok.
Use the RDP to connect
Jupyter notebook
Click on the jupyter notebook item on the bottom
New notebook with python3
Open the notebook folder which has pre-installed notebooks.
Check out the IrisClassifierPyMLWebService (https://github.com/Azure-Samples/Azure-MachineLearning-DataScience/blob/master/Data-Science-Virtual-Machine/Samples/Notebooks/IrisClassifierPyMLWebService.ipynb ) altough it is being updated. Here is an updated repo https://github.com/Azure/MachineLearningNotebooks
https://docs.microsoft.com/en-us/azure/machine-learning/
Create a Linux (ubuntu) data science virtual machine
+ Create a new resource and search Data Science Virtual Machine for Linux (Ubuntu)
Click 'Create'
Create a Resource Group
Name the VM
Use standard image and standard size.
Enter a username and password (all lowercase in username).
Choose DSv2 as the tier.
Create
Go To Resource and make a copy of the public IP address.
Copy the Public IP and go to https://VMIP:8000 or https://VMIP:8000/user/USERNAME/lab
When finished click 'Dissociate' then click 'Delete', click 'Delete'
Create a VM (not tested)
az vm create --name VMname --resource-group RGname --size D96as_v4 --generate-ssh-keys
Azure ML Studio
Create compute resources
Click on Compute
Compute instances -> CPU Standard_DS11_v2
Compute clusters -> CPU Standard_DS11_v2. 2. 2.
Create data
Click on Datasets
from web files:
bike-rentals
Bicycle rental data
Use headers from the first file
Click on data -> Explore
Automated ML
+ New Automated ML Run
bike-rentals data
mslearn-bike-rental
rentals
bbb-cluster
Regression
metric - normalized root mean squared error
Block all algorithms but RandomForest & LightGBM
Exit training job time - 0.25
Exit metric score - 0.08
Review the best model
Select the algorithm (MaxAbsScaler, LightGBM)
Click on 'view all other metrics
Metrics -> residuals, predicted_true
Explanations
Deploy
predict-rentals
Predict cycle rentals
ACI (Azure Container Instance)
Enable authentication
Endpoints -> predict-rentals -> Consume. Copy the REST endpoint and primary key
Predict
Notebooks
Create new file
Test-Bikes
Notebook
Overwrite if already exists
Collapse file explorer
Add the following
endpoint = 'YOUR_ENDPOINT' #Replace with your endpoint
key = 'YOUR_KEY' #Replace with your key
import json
import requests
#An array of features based on five-day weather forecast
x = [[1,1,2022,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446],
[2,1,2022,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539],
[3,1,2022,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309],
[4,1,2022,1,0,2,1,1,0.2,0.212122,0.590435,0.160296],
[5,1,2022,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869]]
#Convert the array to JSON format
input_json = json.dumps({"data": x})
#Set the content type and authentication for the request
headers = {"Content-Type":"application/json",
"Authorization":"Bearer " + key}
#Send the request
response = requests.post(endpoint, input_json, headers=headers)
#If we got a valid response, display the predictions
if response.status_code == 200:
y = json.loads(response.json())
print("Predictions:")
for i in range(len(x)):
print (" Day: {}. Predicted rentals: {}".format(i+1, max(0, round(y["result"][i]))))
else:
print(response)
Save and checkpoint
Run cell
Clean up
Endpoints
Click model -> delete
Compute
Compute instances -> stop
Compute cluster -> computer name -> edit -> 0 nodes
Azure portal -> resource groups -> resource group name -> delete
Create inference cluster
Standard_DS11_v2. Dev-test
2
Designer
+
Change Draft name
Select compute target (choose cluster)
Sample datasets
Drag Automobile price data (Raw) dataset onto canvas
Right click it -> Vizualize -> Dataset Output -> price column
Data Transformation -> Drag 'Select Columns in Dataset'. Connect boxes.
Click 'Select Columns in Dataset' and click 'Edit column'. 'By name'. Add All but remove 'normalized-losses'
Drag 'Clean Missing Data'. Connect boxes and click 'Edit column'. With rules then include 'bore', 'stroke' and 'horsepower'. Cleaning mode - Remove entire row
Drag 'Normalize Data'. Connect left dot from box above. Transformation 'MinMax'
Submit -> Create new named 'auto-price-training'.
Normalize Data -> Outputs + logs. Next to Transformed dataset click on visualize
Drag Split Data. Connect boxes. 0.7 fraction. Random seed 123.
Model Training -> Drag Train Model. Connect left to right. Label column to price
Machine Learning Algorithms -> Linear Regression. connect to Train Model
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
Model Scoring & Evaluation. Score Model. Connect remaining dots
Submit -> Select existing
Select Score Model. Outputs + Logs. Data outputs. Click visualize
Model Scoring & Evaluation -> Evaluate Model. Connect to top left dot
Submit -> Select existing
Click on 'Evaluate Model' then Output and logs then visualize
Click Create inference pipeline -> Real-time inference pipeline.
Rename to 'Predict Auto Price'.
Delete Automobile price data (Raw) and add Data Input and Output -> Enter Data Manually
Remove price from Select Columns in Dataset
Remove evaluate model
Insert Python Language -> Execute Python Script
Add text into manual entry
symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,width,height,curb-weight,engine-type,num-of-cylinders,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg 3,NaN,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9,111,5000,21,27 3,NaN,alfa-romero,gas,std,two,convertible,rwd,front,88.6,168.8,64.1,48.8,2548,dohc,four,130,mpfi,3.47,2.68,9,111,5000,21,27 1,NaN,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,171.2,65.5,52.4,2823,ohcv,six,152,mpfi,2.68,3.47,9,154,5000,19,26
Add python script
import pandas as pd
def azureml_main(dataframe1 = None, dataframe2 = None):
scored_results = dataframe1[['Scored Labels']]
scored_results.rename(columns={'Scored Labels':'predicted_price'},
inplace=True)
return scored_results
Click on Execute Python Script. Then Result dataset then vizualize
Submit. New experiment. Named. predict-auto-price
Click Deploy. Attache to inference cluster and click deploy
Click on Endpoints and open predict-auto-price. view consume tab. copy REST endpoint and key
Open ML stuido in second tab. create Test-Autos. notebook
endpoint = 'YOUR_ENDPOINT' #Replace with your endpoint
key = 'YOUR_KEY' #Replace with your key
import urllib.request
import json
import os
# Prepare the input data
data = {
"Inputs": {
"WebServiceInput0":
[
{
'symboling': 3,
'normalized-losses': None,
'make': "alfa-romero",
'fuel-type': "gas",
'aspiration': "std",
'num-of-doors': "two",
'body-style': "convertible",
'drive-wheels': "rwd",
'engine-location': "front",
'wheel-base': 88.6,
'length': 168.8,
'width': 64.1,
'height': 48.8,
'curb-weight': 2548,
'engine-type': "dohc",
'num-of-cylinders': "four",
'engine-size': 130,
'fuel-system': "mpfi",
'bore': 3.47,
'stroke': 2.68,
'compression-ratio': 9,
'horsepower': 111,
'peak-rpm': 5000,
'city-mpg': 21,
'highway-mpg': 27,
},
],
},
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ key)}
req = urllib.request.Request(endpoint, body, headers)
try:
response = urllib.request.urlopen(req)
result = response.read()
json_result = json.loads(result)
y = json_result["Results"]["WebServiceOutput0"][0]["predicted_price"]
print('Predicted price: {:.2f}'.format(y))
except urllib.error.HTTPError as error:
print("The request failed with status code: " + str(error.code))
# Print the headers to help debug the error
print(error.info())
print(json.loads(error.read().decode("utf8", 'ignore')))
SDK
https://docs.microsoft.com/en-us/python/api/overview/azure/ml/?view=azure-ml-py
pip install azureml-sdk
You can download the config.json from the Azure portal (in the Machine Learning instance)
Create resources as
from azureml.core import Workspace
ws = Workspace.create(name='aml-workspace',
subscription_id='123456-abc-123...',
resource_group='aml-resources',
create_resource_group=True,
location='eastus'
)
or
az ml workspace create -w 'aml-workspace' -g 'aml-resources'
Connect to the config as:
from azureml.core import Workspace
ws = Workspace.from_config()
See compute targets:
for compute_name in ws.compute_targets:
compute = ws.compute_targets[compute_name]
print(compute.name, ":", compute.type)
Install az cli on linuz:
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
Add the ml extension
az extension add -n azure-cli-ml
https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-setup-vscode-extension
Experiment run context
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-track-experiments
from azureml.core import Experiment
# create an experiment variable
experiment = Experiment(workspace = ws, name = "my-experiment")
# start the experiment
run = experiment.start_logging()
# experiment code goes here
# end the experiment
run.complete()
Log number of observation in a file:
from azureml.core import Experiment
import pandas as pd
# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace = ws, name = 'my-experiment')
# Start logging data from the experiment
run = experiment.start_logging()
# load the dataset and count the rows
data = pd.read_csv('data.csv')
row_count = (len(data))
# Log the row count
run.log('observations', row_count)
# Complete the experiment
run.complete()
Retrieve logs
from azureml.widgets import RunDetails
RunDetails(run).show()
or
import json
# Get logged metrics
metrics = run.get_metrics()
print(json.dumps(metrics, indent=2))
Add files to the upload path
run.upload_file(name='outputs/sample.csv', path_or_stream='./sample.csv')
Retrieve these files as:
import json
files = run.get_file_names()
print(json.dumps(files, indent=2))
Experiment script
from azureml.core import Run
import pandas as pd
import matplotlib.pyplot as plt
import os
# Get the experiment run context
run = Run.get_context()
# load the diabetes dataset
data = pd.read_csv('data.csv')
# Count the rows and log the result
row_count = (len(data))
run.log('observations', row_count)
# Save a sample of the data
os.makedirs('outputs', exist_ok=True)
data.sample(100).to_csv("outputs/sample.csv", index=False, header=True)
# Complete the run
run.complete()
Create an script configuration. For example, can have an experiment_files which also contains data:
from azureml.core import Experiment, ScriptRunConfig
# Create a script config
script_config = ScriptRunConfig(source_directory=experiment_folder,
script='experiment.py')
# submit the experiment
experiment = Experiment(workspace = ws, name = 'my-experiment')
run = experiment.submit(config=script_config)
run.wait_for_completion(show_output=True)
Estimator
Estimators are encapsure run configuration and script configuration in a single object
A script to train a model
from azureml.core import Run
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Get the experiment run context
run = Run.get_context()
# Prepare the dataset
diabetes = pd.read_csv('data.csv')
X, y = data[['Feature1','Feature2','Feature3']].values, data['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
# Train a logistic regression model
reg = 0.1
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)
# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
run.log('Accuracy', np.float(acc))
# Save the trained model
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename='outputs/model.pkl')
run.complete()
Use a generic Estimator class to define a run configuration for a training script like this:
from azureml.train.estimator import Estimator
from azureml.core import Experiment
# Create an estimator
estimator = Estimator(source_directory='experiment_folder',
entry_script='training_script.py',
compute_target='local',
conda_packages=['scikit-learn']
)
# Create and run an experiment
experiment = Experiment(workspace = ws, name = 'training_experiment')
run = experiment.submit(config=estimator)
Framework specific estimators simplify configuration. For example, the SKLearn contains it's dependencies. (https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets)
from azureml.train.sklearn import SKLearn
from azureml.core import Experiment
# Create an estimator
estimator = SKLearn(source_directory='experiment_folder',
entry_script='training_script.py'
compute_target='local'
)
# Create and run an experiment
experiment = Experiment(workspace = ws, name = 'training_experiment')
run = experiment.submit(config=estimator)
Script parameters
Use parameters to set variables in the script. Read the argument reg:
from azureml.core import Run
import argparse
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# Get the experiment run context
run = Run.get_context()
# Set regularization hyperparameter
parser = argparse.ArgumentParser()
parser.add_argument('--reg_rate', type=float, dest='reg', default=0.01)
args = parser.parse_args()
reg = args.reg
# Prepare the dataset
diabetes = pd.read_csv('data.csv')
X, y = data[['Feature1','Feature2','Feature3']].values, data['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30)
# Train a logistic regression model
model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)
# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
run.log('Accuracy', np.float(acc))
# Save the trained model
os.makedirs('outputs', exist_ok=True)
joblib.dump(value=model, filename='outputs/model.pkl')
run.complete()
Use in an estimator by using a dictionary to change values in a script
from azureml.train.sklearn import SKLearn
from azureml.core import Experiment
# Create an estimator
estimator = SKLearn(source_directory='experiment_folder',
entry_script='training_script.py',
script_params = {'--reg_rate': 0.1},
compute_target='local'
)
# Create and run an experiment
experiment = Experiment(workspace = ws, name = 'training_experiment')
run = experiment.submit(config=estimator)
Register models
You can retrieve files using:
# "run" is a reference to a completed experiment run
# List the files generated by the experiment
for file in run.get_file_names():
print(file)
# Download a named file
run.download_file(name='outputs/model.pkl', output_file_path='model.pkl')
To register a model from a local file you can do:
from azureml.core import Model
model = Model.register(workspace=ws,
model_name='classification_model',
model_path='model.pkl', # local path
description='A classification model',
tags={'dept': 'sales'},
model_framework=Model.Framework.SCIKITLEARN,
model_framework_version='0.20.3')
or with a remote experiment:
run.register_model( model_name='classification_model',
model_path='outputs/model.pkl', # run outputs path
description='A classification model',
tags={'dept': 'sales'},
model_framework=Model.Framework.SCIKITLEARN,
model_framework_version='0.20.3')
See registered models by doing:
from azureml.core import Model
for model in Model.list(ws):
# Get model name and auto-generated version
print(model.name, 'version:', model.version)
Azure ML
https://azure.microsoft.com/en-us/services/machine-learning/
Signed up for the free tier of Azure ML
1st experiment
https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup
Click 'create a resource'.
search Machine Learning and click Enter
Name the workspace 'docs-ws'
Choose the 'Free tier' subscription. and create a resource group called 'docs-aml'
Choose 'basic' for the work-space edition
Click the 'review and create button' at the bottom. Click 'create'.
select 'Go to resource' button.
Sign in to https://ml.azure.com/ and choose the same subscription and resource group
Select Notebooks on the left, Open the Samples folder, Open the Python folder. Open the folder with a version number on it (current version of Python SDK (software development kit))
Select the "..." at the right of the tutorials folder and then select Clone. Select your folder to clone the tutorials folder there. Click 'clone'.
Select the tutorial-1st-experiment-sdk-train.ipynb file in your tutorials folder (User files -> rbell -> tutorials ->)
Click + New VM. Name it 'docs-am-vm'
The notebook is stored at https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-1st-experiment-sdk-train
from azureml.core import Workspace
ws = Workspace.from_config()
# go to https://microsoft.com/devicelogin and sign in
create an experiment in your workspace
from azureml.core import Experiment
experiment = Experiment(workspace=ws, name="diabetes-experiment")
Read in data
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
X, y = load_diabetes(return_X_y = True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=66)
Build a script that trains ridge models in a loop through different hyperparameter alpha values
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
import math
alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
for alpha in alphas:
run = experiment.start_logging()
run.log("alpha_value", alpha)
model = Ridge(alpha=alpha)
model.fit(X=X_train, y=y_train)
y_pred = model.predict(X=X_test)
rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))
run.log("rmse", rmse)
model_name = "model_alpha_" + str(alpha) + ".pkl"
filename = "outputs/" + model_name
joblib.dump(value=model, filename=filename)
run.upload_file(name=model_name, path_or_stream=filename)
run.complete()
After the training has completed, call the experiment variable to fetch a link to the experiment in the portal.
experiment
Click on the link to the report
Clicking on a run number link in the RUN NUMBER column takes you to the page for each individual run. You can see the pickle files in 'Outputs'.
Get the best run
minimum_rmse_runid = None
minimum_rmse = None
for run in experiment.get_runs():
run_metrics = run.get_metrics()
run_details = run.get_details()
# each logged metric becomes a key in this returned dict
run_rmse = run_metrics["rmse"]
run_id = run_details["runId"]
if minimum_rmse is None:
minimum_rmse = run_rmse
minimum_rmse_runid = run_id
else:
if run_rmse < minimum_rmse:
minimum_rmse = run_rmse
minimum_rmse_runid = run_id
print("Best run_id: " + minimum_rmse_runid)
print("Best run_id rmse: " + str(minimum_rmse))
see all the files available for download from this run
from azureml.core import Run
best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)
print(best_run.get_file_names())
Download this model to the current directory
best_run.download_file(name="model_alpha_0.1.pkl")
To kill the VM go to 'Compute' on the left and stop the VM
Go to resources and delete the 'docs-ws' resource.
Train and Deploy a model
https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import azureml.core from azureml.core
import Workspace
# check core SDK version number print("Azure ML SDK Version: ", azureml.core.VERSION)
# load workspace configuration from the config.json file in the current folder.
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, sep='\t')
experiment_name = 'sklearn-mnist'
from azureml.core import Experiment
exp = Experiment(workspace=ws, name=experiment_name)
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
import os
Create or Attach existing compute resource
# choose a name for your cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpu-cluster")
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)
# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_D2_V2")
if compute_name in ws.compute_targets:
compute_target = ws.compute_targets[compute_name]
if compute_target and type(compute_target) is AmlCompute:
print('found compute target. just use it. ' + compute_name)
else:
print('creating a new compute target...')
provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
min_nodes = compute_min_nodes,
max_nodes = compute_max_nodes)
# create the cluster
compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
# can poll for a minimum number of nodes and for a specific timeout.
# if no min node count is provided it will use the scale settings for the cluster
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
# For a more detailed view of current AmlCompute status, use get_status()
print(compute_target.get_status().serialize())
Download MNIST
import urllib.request
data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))
Display some numbers
# make sure utils.py is in the same directory as this code
from utils import load_data
# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.
X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0
X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)
y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
# now let's show some randomly chosen images from the traininng set.
count = 0
sample_size = 30
plt.figure(figsize = (16, 6))
for i in np.random.permutation(X_train.shape[0])[:sample_size]:
count = count + 1
plt.subplot(1, sample_size, count)
plt.axhline('')
plt.axvline('')
plt.text(x=10, y=-10, s=y_train[i], fontsize=18)
plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)
plt.show()
Create a FileDataset (https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-create-register-datasets)
from azureml.core.dataset import Dataset
web_paths = [
'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
]
dataset = Dataset.File.from_files(path = web_paths)
Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.
dataset = dataset.register(workspace = ws,
name = 'mnist dataset',
description='training and test dataset',
create_new_version=True)
# list the files referenced by dataset
dataset.to_path()
Train on a remote cluster
import os
script_folder = os.path.join(os.getcwd(), "sklearn-mnist")
os.makedirs(script_folder, exist_ok=True)
Create train.py
%%writefile $script_folder/train.py
import argparse
import os
import numpy as np
import glob
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib
from azureml.core import Run
from utils import load_data
# let user feed in 2 parameters, the dataset to mount or download, and the regularization rate of the logistic regression model
parser = argparse.ArgumentParser()
parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')
parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')
args = parser.parse_args()
data_folder = args.data_folder
print('Data folder:', data_folder)
# load train and test set into numpy arrays
# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.
X_train = load_data(glob.glob(os.path.join(data_folder, '**/train-images-idx3-ubyte.gz'), recursive=True)[0], False) / 255.0
X_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-images-idx3-ubyte.gz'), recursive=True)[0], False) / 255.0
y_train = load_data(glob.glob(os.path.join(data_folder, '**/train-labels-idx1-ubyte.gz'), recursive=True)[0], True).reshape(-1)
y_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-labels-idx1-ubyte.gz'), recursive=True)[0], True).reshape(-1)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\n')
# get hold of the current run
run = Run.get_context()
print('Train a logistic regression model with regularization rate of', args.reg)
clf = LogisticRegression(C=1.0/args.reg, solver="liblinear", multi_class="auto", random_state=42)
clf.fit(X_train, y_train)
print('Predict the test set')
y_hat = clf.predict(X_test)
# calculate accuracy on the prediction
acc = np.average(y_hat == y_test)
print('Accuracy is', acc)
run.log('regularization rate', np.float(args.reg))
run.log('accuracy', np.float(acc))
os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
Copy utils.py to the remote cluster
import shutil
shutil.copy('utils.py', script_folder)
Create an estimator which submits the run
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
# to install required packages
env = Environment('my_env')
cd = CondaDependencies.create(pip_packages=['azureml-sdk','scikit-learn','azureml-dataprep[pandas,fuse]>=1.1.14'])
env.python.conda_dependencies = cd
from azureml.train.sklearn import SKLearn
script_params = {
# to mount files referenced by mnist dataset
'--data-folder': dataset.as_named_input('mnist').as_mount(),
'--regularization': 0.5
}
est = SKLearn(source_directory=script_folder,
script_params=script_params,
compute_target=compute_target,
environment_definition=env,
entry_script='train.py')
Submit the job to the cluster
run = exp.submit(config=est)
run
Here is what is happening:
Image creation: A Docker image is created matching the Python environment specified by the estimator. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes about 5 minutes.
Scaling
Running
Post-Processing: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.
Watch the progress of the run with a Jupyter widget
from azureml.widgets import RunDetails
RunDetails(run).show()
Get log results upon completion
# specify show_output to True for a verbose log
run.wait_for_completion(show_output=True)
Display run results
print(run.get_metrics())
The last step in the training script wrote the file outputs/sklearn_mnist_model.pkl in a directory named outputs in the VM of the cluster where the job is executed.
See files associated with that run
print(run.get_file_names())
Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model.
# register model
model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')
print(model.name, model.id, model.version, sep='\t')
https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-deploy-models-with-aml
Deploy the model as a web service in Azure Container Instances. A web service is an image, in this case a Docker image. It encapsulates the scoring logic and the model itself.
More info on deploying here: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where
Import packages
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import azureml.core
# display the core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)
Retrieve the model
from azureml.core import Workspace
from azureml.core.model import Model
import os
ws = Workspace.from_config()
model=Model(ws, 'sklearn_mnist')
model.download(target_dir=os.getcwd(), exist_ok=True)
# verify the downloaded model file
file_path = os.path.join(os.getcwd(), "sklearn_mnist_model.pkl")
os.stat(file_path)
Before deploying, make sure your model is working locally by:
Loading test data
Predicting test data
Examining the confusion matrix
Load the test data from the ./data/ directory created during the training tutorial.
from utils import load_data
import os
data_folder = os.path.join(os.getcwd(), 'data')
# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster
X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
Feed the test dataset to the model to get predictions.
import pickle
#from sklearn.externals import joblib
import joblib
clf = joblib.load( os.path.join(os.getcwd(), 'sklearn_mnist_model.pkl'))
y_hat = clf.predict(X_test)
Generate a confusion matrix to see how many samples from the test set are classified correctly
from sklearn.metrics import confusion_matrix
conf_mx = confusion_matrix(y_test, y_hat)
print(conf_mx)
print('Overall accuracy:', np.average(y_hat == y_test))
Display the confusion matrix as a graph (The color in each grid represents the error rate)
# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = conf_mx / row_sums
np.fill_diagonal(norm_conf_mx, 0)
fig = plt.figure(figsize=(8,5))
ax = fig.add_subplot(111)
cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)
ticks = np.arange(0, 10, 1)
ax.set_xticks(ticks)
ax.set_yticks(ticks)
ax.set_xticklabels(ticks)
ax.set_yticklabels(ticks)
fig.colorbar(cax)
plt.ylabel('true labels', fontsize=14)
plt.xlabel('predicted values', fontsize=14)
plt.savefig('conf.png')
plt.show()
Deploy as a web service
Deploy the model as a web service hosted in Container Instances
To build the correct environment for Container Instances, provide the following components:
A scoring script to show how to use the model.
An environment file to show what packages need to be installed.
A configuration file to build the container instance.
The model you trained previously.
Create the scoring script, called score.py. The web service call uses this script to show how to use the model.
u must include two required functions into the scoring script:
The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.
The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.
%%writefile score.py
import json
import numpy as np
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
from azureml.core.model import Model
def init():
global model
# retrieve the path to the model file using the model name
model_path = Model.get_model_path('sklearn_mnist')
model = joblib.load(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)['data'])
# make prediction
y_hat = model.predict(data)
# you can return any data type as long as it is JSON-serializable
return y_hat.tolist()
Create environment file
from azureml.core.conda_dependencies import CondaDependencies
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
# Write
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
# Read to check
with open("myenv.yml","r") as f:
print(f.read())
Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container
from azureml.core.webservice import AciWebservice
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data": "MNIST", "method" : "sklearn"},
description='Predict MNIST with sklearn')
Configure the image and deploy. The following code goes through these steps:
Build an image using:
The scoring file (score.py)
The environment file (myenv.yml)
The model file
Register that image under the workspace.
Send the image to the ACI container.
Start up a container in ACI using the image.
Get the web service HTTP endpoint.
%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
inference_config = InferenceConfig(runtime= "python",
entry_script="score.py",
conda_file="myenv.yml")
service = Model.deploy(workspace=ws,
name='sklearn-mnist-svc',
models=[model],
inference_config=inference_config,
deployment_config=aciconfig)
service.wait_for_deployment(show_output=True)
Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.
print(service.scoring_uri)
Test deployed service
Send the data as a JSON array to the web service hosted in ACI. Use the SDK's run API to invoke the service. You can also make raw calls using any HTTP tool such as curl. Print the returned predictions and plot them along with the input images.
Red font and inverse image (white on black) is used to highlight the misclassified samples.
import json
# find 30 random samples from test set
n = 30
sample_indices = np.random.permutation(X_test.shape[0])[0:n]
test_samples = json.dumps({"data": X_test[sample_indices].tolist()})
test_samples = bytes(test_samples, encoding='utf8')
# predict using the deployed model
result = service.run(input_data=test_samples)
# compare actual value vs. the predicted values:
i = 0
plt.figure(figsize = (20, 1))
for s in sample_indices:
plt.subplot(1, n, i + 1)
plt.axhline('')
plt.axvline('')
# use different color for misclassified sample
font_color = 'red' if y_test[s] != result[i] else 'black'
clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys
plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)
plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)
i = i + 1
plt.show()
You can also send raw HTTP request to test the web service.
import requests
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"
headers = {'Content-Type':'application/json'}
# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
resp = requests.post(service.scoring_uri, input_data, headers=headers)
print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", y_test[random_index])
print("prediction:", resp.text)
delete only the ACI deployment using this API call
service.delete()
AutoML on a regression problem
https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models
Import the necessary packages
from azureml.opendatasets import NycTlcGreen
import pandas as pd from datetime
import datetime from
dateutil.relativedelta import relativedelta
fetch one month at a time
green_taxi_df = pd.DataFrame([])
start = datetime.strptime("1/1/2015","%m/%d/%Y")
end = datetime.strptime("1/31/2015","%m/%d/%Y")
for sample_month in range(12):
temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \
.to_pandas_dataframe()
green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))
green_taxi_df.head(10)
create various time-based features and use the apply() function on the dataframe to iteratively apply the build_time_features() function to each row in the taxi data.
def build_time_features(vector):
pickup_datetime = vector[0]
month_num = pickup_datetime.month
day_of_month = pickup_datetime.day
day_of_week = pickup_datetime.weekday()
hour_of_day = pickup_datetime.hour
return pd.Series((month_num, day_of_month, day_of_week, hour_of_day))
green_taxi_df[["month_num", "day_of_month","day_of_week", "hour_of_day"]] = green_taxi_df[["lpepPickupDatetime"]].apply(build_time_features, axis=1)
green_taxi_df.head(10)
Remove some of the columns that you won't need for training or additional feature building
columns_to_remove = ["lpepPickupDatetime", "lpepDropoffDatetime", "puLocationId", "doLocationId", "extra", "mtaTax",
"improvementSurcharge", "tollsAmount", "ehailFee", "tripType", "rateCodeID",
"storeAndFwdFlag", "paymentType", "fareAmount", "tipAmount"
]
for col in columns_to_remove:
green_taxi_df.pop(col)
green_taxi_df.head(5)
See summary stats
green_taxi_df.describe()
there are several fields that have outliers or values that will reduce model accuracy. Filter the lat/long fields to be within the bounds of the Manhattan area. filter the `tripDistance` field to be greater than zero but less than 31 miles. totalAmount > 0. passengerCount > 0.
final_df = green_taxi_df.query("pickupLatitude>=40.53 and pickupLatitude<=40.88")
final_df = final_df.query("pickupLongitude>=-74.09 and pickupLongitude<=-73.72")
final_df = final_df.query("tripDistance>=0.25 and tripDistance<31")
final_df = final_df.query("passengerCount>0 and totalAmount>0")
columns_to_remove_for_training = ["pickupLongitude", "pickupLatitude", "dropoffLongitude", "dropoffLatitude"]
for col in columns_to_remove_for_training:
final_df.pop(col)
Call describe
final_df.describe()
Configure work space
from azureml.core.workspace import Workspace
ws = Workspace.from_config()
Split the data into train and test
from sklearn.model_selection import train_test_split
y_df = final_df.pop("totalAmount")
x_df = final_df
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)
Automatically train a model
Define settings for the experiment run. Attach your training data to the configuration, and modify settings that control the training process.
Submit the experiment for model tuning. After submitting the experiment, the process iterates through different machine learning algorithms and hyperparameter settings, adhering to your defined constraints. It chooses the best-fit model by optimizing an accuracy metric.
Training settings can be found here https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train
|Property| Value in this tutorial |Description|
|----|----|---|
|**iteration_timeout_minutes**|2|Time limit in minutes for each iteration. Reduce this value to decrease total runtime.|
|**iterations**|20|Number of iterations. In each iteration, a new machine learning model is trained with your data. This is the primary value that affects total run time.|
|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model will be chosen based on this metric.|
|**preprocess**| True | By using **True**, the experiment can preprocess the input data (handling missing data, converting text to numeric, etc.)|
|**verbosity**| logging.INFO | Controls the level of logging.|
|**n_cross_validations**|5|Number of cross-validation splits to perform when validation data is not specified.|
import logging
automl_settings = {
"iteration_timeout_minutes": 2,
"iterations": 20,
"primary_metric": 'spearman_correlation',
"preprocess": True,
"verbosity": logging.INFO,
"n_cross_validations": 5
}
This is a regression task. See full inputs here: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#configure-your-experiment-settings
from azureml.train.automl import AutoMLConfig
automl_config = AutoMLConfig(task='regression',
debug_log='automated_ml_errors.log',
X=x_train.values,
y=y_train.values.flatten(),
**automl_settings)
Train the regression model
Create an experiment object in your workspace. Pass the defined automl_config object to the experiment, and set the output to True to view progress during the run.
from azureml.core.experiment import Experiment
experiment = Experiment(ws, "taxi-experiment")
local_run = experiment.submit(automl_config, show_output=True)
Gives status on outliers and cardinality.
Explore the output in a widget https://docs.microsoft.com/en-us/python/api/azureml-widgets/azureml.widgets?view=azure-ml-py
from azureml.widgets import RunDetails
RunDetails(local_run).show()
Retrieve the best model
best_run, fitted_model = local_run.get_output()
print(best_run)
print(fitted_model)
Use the best model to run predictions on the test data set to predict taxi fares.
y_predict = fitted_model.predict(x_test.values)
print(y_predict[:10])
Calculate the root mean squared error of the results
from sklearn.metrics import mean_squared_error
from math import sqrt
y_actual = y_test.values.flatten().tolist()
rmse = sqrt(mean_squared_error(y_actual, y_predict))
rmse
Calculate mean absolute percent error (MAPE) by using the full y_actual and y_predict data sets
sum_actuals = sum_errors = 0
for actual_val, predict_val in zip(y_actual, y_predict):
abs_error = actual_val - predict_val
if abs_error < 0:
abs_error = abs_error * -1
sum_errors = sum_errors + abs_error
sum_actuals = sum_actuals + actual_val
mean_abs_percent_error = sum_errors / sum_actuals
print("Model MAPE:")
print(mean_abs_percent_error)
print()
print("Model Accuracy:")
print(1 - mean_abs_percent_error)
Azure Kubernetes Service (AKS)
Notes taken from https://zero-to-jupyterhub.readthedocs.io/en/latest/ and https://docs.dask.org/en/latest/setup/kubernetes-helm.html
Deploy and customize your own JupyterHub on a cloud.
Some definitions:
Kubernetes - Manage resources on the cloud - https://kubernetes.io/
Helm - Configure and control the packaged JupyterHub installation - https://helm.sh/
JupyterHub - Give users access to a Jupyter computing environment - https://jupyterhub.readthedocs.io/en/stable/
Docker - Build customized image for the users - https://www.docker.com/
Domain registration - make the hub available at https://your-domain-name.com - e.g. https://hub.pangeo.io/hub/login
More detail into the lingo here - https://zero-to-jupyterhub.readthedocs.io/en/latest/reference/tools.html#tools
Step Zero: your Kubernetes cluster
During the process of setting up JupyterHub, you’ll be creating some files for configuration purposes. It may be helpful to create a folder for your JuypterHub deployment to keep track of these files.
Create a Kubernetes cluster either through the Azure portal website, or using the Azure command line tools
Test using the portal
Sign into Azure
Create a resource -> Containers -> Kubernetes Service
Create a resource group -> name is 'k8s'
Enter a name for the Kubernetes cluster 'myAKSCluster'
Try Kubernetes version 1.14.8
Select a VM node size (DS2 v2)
Select a Node count of 1
Click 'Next : Scale'
Click 'Next : Authentication'
Click 'Review + create'. Got 'Failed to create a service principal. You can use an existing service principal or try again later.' error https://docs.microsoft.com/en-us/azure/aks/troubleshooting#im-receiving-errors-that-my-service-principal-was-not-found-when-i-try-to-create-a-new-cluster-without-passing-in-an-existing-one
Test using the CLI
1. Sign into Azure
2. Create a directory as cannot create apps in the main directory
Go to Azure Active Directory -> Bottom right 'Create Directory' and name it 'DIRECTORYNAME'. It will create 'DIRECTORYNAME.onmicrosoft.com'.
Move the Free subscription to this directory by going to subscriptions -> Click on subscription -> Change directory. (can take up to an hour to change ownership).
3. Click on the cloud shell button '>_'
4. Select 'bash' and create a storage.
4. a) Install the CLI locally https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest
Run PowerShell as an administrator and run
Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi; Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'
4. b) Open a terminal locally and type 'az login' (had to change default browser to chrome). See subscriptions by doing az account list --refresh --output table
5. Choose a subscription as az account set --subscription "NAME"
6. Create a resource group as az group create --name=RG_NAME --location="East US" --output table
7. Switch to the Cloud Shell. Make a folder of the name of your cluster (no _ in name or cluster)
mkdir CLUSTER_NAME
cd CLUSTER_NAME
ssh-keygen -f ssh-key-CLUSTER_NAME ! Leave empty for no password
8. Create a AKS Cluster (on the cloud shell).
az aks create --name CLUSTER_NAME \
--resource-group RG_NAME \
--ssh-key-value ssh-key-CLUSTER_NAME.pub \
--node-count 2 \
--node-vm-size Standard_DS2_v2 \
--output table
This should take a few minutes. When complete it shows various information.
9. a) Install kubectl https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-on-windows
az aks install-cli
Add C:\Users\USER\.azure-kubectl to the path. Either do it in the environmental variable or run
set PATH=%PATH%;C:\Users\USER\.azure-kubectl
10. Get credentials
az aks get-credentials \
--name CLUSTER_NAME \
--resource-group RG_NAME \
--output table
11. Check the cluster is working
kubectl get node
Should show two running nodes, their k8s version and a status of 'Ready'.
Step One: Setting up Helm
https://zero-to-jupyterhub.readthedocs.io/en/latest/setup-jupyterhub/setup-helm.html
Helm charts works as a templating engine to populate multiple yaml files. It then runs kubectl apply. Helm has two parts: a client (helm) and a server (tiller). Tiller runs inside of your Kubernetes cluster as a pod in the kube-system namespace. Tiller manages both, the releases (installations) and revisions (versions) of charts deployed on the cluster. When you run helm commands, your local Helm client sends instructions to tiller in the cluster that in turn make the requested changes.
Tiller will be present in the kubernetes cluster and the helm client talks to it for deploying applications using helm charts.
1. Download and install helm locally https://helm.sh/docs/intro/install/
You may want to use https://chocolatey.org/install if using windows (https://helm.sh/docs/intro/install/#from-chocolatey-windows).
I think the rest is redundent with helm3...
2. Set up a ServiceAccount for use by tiller. (diverge from the docs here)
Create a YAML file called helm-rbac.yml
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
Then run kubectl apply -f helm-rbac.yaml
3. Initialize helm and tiller:
helm init --service-account tiller --wait
Foot notes
https://pangeo.io/setup_guides/cloud.html
https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads
https://docs.microsoft.com/en-us/partner-center/set-an-azure-spending-budget-for-your-customers
https://devopscube.com/install-configure-helm-kubernetes/
https://docs.microsoft.com/en-us/azure/aks/kubernetes-helm
https://v3.helm.sh/docs/intro/quickstart/
https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough-portal
K8s with Dask on Windows
0a. Setup a subscription in the Azure portal.
0b. Install the Azure Comand Line on you local machine https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-windows?view=azure-cli-latest
0c. Install kubectl on your local machine az aks install-cli (https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough#connect-to-the-cluster)
0d. Add C:\Users\131416\.azure-kubectl to PATH by going to Control-Panel->System->Advanced->Environment Variables and add new. Open a new power shell.
0e. Install Chocolatey https://chocolatey.org/install
0f. Install helm https://helm.sh/docs/intro/install/
1. On local machine open PowerShell as an administrator and type az login
2. See subscriptions by doing az account list --refresh --output table
3. Choose a subscription as az account set --subscription "NAME"
4.0. See resource groups as az group list --output table
4. Create a resource group by doing az group create --name=RESOURCEGROUPNAME --location="LOCATION" --output table
See a list of locations here e.g. East US
5a. Go on the Azure Portal and open the Cloud Shell.
5b. Create a directory for the name of your cluster. Go into the cluster and generate keys.
mkdir CLUSTERNAME
cd CLUSTERNAME
ssh-keygen -f ssh-key-CLUSTER_NAME ! Leave empty for no passphrase
5c. Copy the public key to your local machine... TODO
6. Create an Azure Kubernetes Service on the Cloud Shell (takes a few minutes).
az aks create --name CLUSTER_NAME \
--resource-group RG_NAME \
--ssh-key-value ssh-key-CLUSTER_NAME.pub \
--node-count 2 \
--node-vm-size Standard_DS2_v2 \
--output table
7. Go back to local machine and get credentials of the AKS
az aks get-credentials \
--name CLUSTER_NAME \
--resource-group RG_NAME \
--output table
8. Check the cluster is working
kubectl get node
9. launch a Dask scheduler, several workers, and an optional Jupyter Notebook server on a Kubernetes
helm repo add dask https://helm.dask.org/
helm repo update
helm install dask/dask --generate-name
10. Check status by running
kubectl get pods
kubectl get services
11. After running kubectl get services you will see an EXTERNAL-IP.
To access Jupyter Lab: copy the EXTERNAL-IP value for the 'X-jupyter' NAME into a browser and use password 'dask'.
To access the Dashboard: copy the EXTERNAL-IP value for the 'X-scheduler' NAME into a browser.
12. Create a new notebook and run:
from dask.distributed import Client, config
client = Client()
import distributed
import dask.array as da
array = da.ones((1000, 1000, 1000), chunks=(100, 100, 10))
print(array.mean().compute())
20. Delete the cluster
az aks delete --resource-group RG_NAME --name CLUSTER_NAME --no-wait
21. Delete the resource group
az group delete --name RG_NAME