Databricks

Create database

%sql CREATE DATABASE tmp

Run a notebook in another notebook

%run /HOME_FOLDER/notebook

Mount a blob

dbutils.fs.mount(

source = "wasbs://CONTAINER@ACCOUNT_NAME.blob.core.windows.net",

mount_point = "/mnt/CONTAINER",

extra_configs = {"fs.azure.account.key.ACCOUNT_NAME.blob.core.windows.net":"ACCOUNT_KEY"})

Unmount a blob

dbutils.fs.unmount("/mnt/CONTAINER")

Upgrade pandas and numpy

dbutils.library.installPyPI('numpy','1.16.3')

dbutils.library.installPyPI('pandas','0.24.2')

dbutils.library.restartPython()

Install packages from PyPi

Click on the clusters tab -> Click cluster name -> library -> Install new -> PyPi -> Name of package

dbutils.library.installPyPI("koalas")

dbutils.library.restartPython()

Add conda magic commands

In spark configuration

spark.databricks.conda.condaMagic.enabled true

https://docs.databricks.com/notebooks/notebooks-python-libraries.html

Install packages from conda

On command line:

%sh /databricks/conda/bin/conda install -y -p /databricks/python -c conda-forge fbprophet

Then detach and reattach

Click on a cluster -> Advanced Options -> Init Scripts

http://abizeradenwala.blogspot.com/2018/05/upgrading-python-version-for-databricks.html

https://docs.databricks.com/user-guide/faq/anaconda-environment.html

Upgrade python

%sh curl -0 https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /dbfs/tmp/


%sh sudo wget https://repo.anaconda.com/archive/Anaconda3-2018.12-Linux-x86_64.sh -O /dbfs/tmp/Anaconda3-2018.12-Linux-x86_64.sh


%sh /databricks/python/bin/pip freeze > /tmp/python_packages.txt


clusterName = "anaconda-cluster"


script = """#!/bin/bash

cp /dbfs/tmp/Anaconda3-2018.12-Linux-x86_64.sh /tmp

sudo bash /tmp/Anaconda3-2018.12-Linux-x86_64.sh -b -p /anaconda3

mv /databricks/python /databricks/python_old

ln -s /anaconda3 /databricks/python

cp /dbfs/dbfs/tmp/python_packages.txt /tmp/python_packages.txt

/databricks/python/bin/pip install -r /tmp/python_packages.txt

"""


dbutils.fs.put("dbfs:/databricks/init/%s/install_conda.sh" % clusterName, script, True)


clusterName = "pcp_ts"

script = """#!/bin/bash

cp /dbfs/tmp/Anaconda3-2018.12-Linux-x86_64.sh /tmp

sudo bash /tmp/Anaconda3-2018.12-Linux-x86_64.sh -b -p /anaconda3

mv /databricks/python /databricks/python_old

ln -s /anaconda3 /databricks/python

cp /dbfs/dbfs/tmp/python_packages.txt /tmp/python_packages.txt

conda activate /databricks/python

pip install -r /tmp/python_packages.txt

/databricks/python/bin/conda install -y -c conda-forge fbprophet

"""

dbutils.fs.put("dbfs:/databricks/init/%s/install_conda.sh" % clusterName, script, True)

Create a cluster with the the same name as anaconda-cluster

Create a init script

dbutils.fs.mkdirs("dbfs:/databricks/my_init_scripts/")

dbutils.fs.put("/databricks/my_init_scripts/pcp_ts-install.sh","""

#!/bin/bash

wget --quiet -O /mnt/Miniconda3-latest-Linux-x86_64.sh https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

wget --quiet -O /mnt/jars/driver-daemon/postgresql-42.2.2.jar http://central.maven.org/maven2/org/postgresql/postgresql/42.2.2/postgresql-42.2.2.jar""", True)



dbutils.fs.put("/databricks/my_init_scripts/pcp_ts-install.sh","""

#!/bin/bash

set -ex

/databricks/python/bin/python -V

. /databricks/conda/etc/profile.d/conda.sh

conda activate /databricks/python

conda install -y fbprophet""", True)


dbutils.fs.rm("/databricks/my_init_scripts/pcp_ts-install.sh")


dbutils.fs.put("/databricks/my_init_scripts/pip-install.sh","""

#!/bin/bash

/databricks/python/bin/pip install --upgrade pip""", True)

See packages with runtime

https://docs.databricks.com/release-notes/runtime/7.3.html#system-environment

Create token

https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/authentication#token-management

Put config info into a cluster

Config looks like:

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")

spark.conf.set("dfs.adls.oauth2.client.id", "X")

spark.conf.set("dfs.adls.oauth2.credential", "X")

spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/X/oauth2/token")