small things

dbutils.fs.ls    "FileNotFound" error

when setup a new instance, the 'databricks-results' folder under the /dbfs directory is empty. If you want to ls the content of the folder, if will fail.

dbutils.fs.ls('/databricks-results')


However, if you use the shell comand

%sh

ls /dbfs/databricks-results

it will show empty


Once you create a sub directory /file within it, the command is Ok



All-purpose cluster, job cluster

You run these workloads as a set of commands in a notebook or as an automated job. Databricks makes a distinction between all-purpose clusters and job clusters. You use all-purpose clusters to analyze data collaboratively using interactive notebooks. You use job clusters to run fast and robust automated jobs.

 

Interactive clusters can be useful, if you are running your ETL pipelines in micro batches(say every 5 minutes)

as job cluster takes some time to spin up (approx. 2 - 4 minutes, depending on multiple factors.)

 

The Databricks job scheduler creates a job cluster when you run a job on a new job cluster and terminates the cluster when the job is complete. You cannot restart a job cluster.

One can not work with job clusters in interactive mode. They can only be used for automated job workflows by Databricks job scheduler.

 

If ETL pipelines are scheduled after 30 minutes(or more) interval, then job cluster can be considered.

 

Running ETL pipelines from Azure Data factory on Databricks clusters

Azure data factory does support triggering Databricks notebooks and running on Databricks clusters.

To connect to Databricks clusters one need to create a Databricks linked service in azure data factory.

 

 

Why run databricks notebook from ADF??

Mount a data lake storage for databricks to keep unmanaged tables / leave it in databricks file system.