Download file from DBFS
1. Write a spark dataframe into csv. The repartition(1) yields only one 'part' file in the output folder
The /FileStore is the default directory when click 'Browse DBFS' on databricks
(
df.repartition(1)
.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("dbfs:/FileStore/temp/output")
)
2. Browse DBFS
Go to the file part and copy the file path
3. Connect through Databricks CLI. This needs the cluster url and access token
databricks configure --token
4. Copy the file from DBFS to local directory
databricks fs cp dbfs:/FileStore/temp/date.csv/part-xxx.csv c:/temp/output.csv
5. Delete the file from DBFS to clean up
Run the dbutils command through notebook on databricks
dbutils.fs.rm('dbfs:/FileStore/temp/output', True)