Download file from DBFS

1. Write a spark dataframe into csv. The repartition(1) yields only one 'part' file in the output folder

The /FileStore is the default directory when click 'Browse DBFS' on databricks

(

df.repartition(1)

.write

.format("com.databricks.spark.csv")

.option("header", "true")

.save("dbfs:/FileStore/temp/output")

)

2. Browse DBFS

Go to the file part and copy the file path

3. Connect through Databricks CLI. This needs the cluster url and access token

databricks configure --token

4. Copy the file from DBFS to local directory

databricks fs cp dbfs:/FileStore/temp/date.csv/part-xxx.csv c:/temp/output.csv

5. Delete the file from DBFS to clean up

Run the dbutils command through notebook on databricks

dbutils.fs.rm('dbfs:/FileStore/temp/output', True)

Page updated

Google Sites

Report abuse