Mount data lake / blob storage in Synapse notebook
In Databricks this can be done easily using the dbutil.fs.mount() function. In Synapse, the mssparkutils.fs.mount() can not do it yet. It has to create a linked service pointing to the blob storage, and then mount the linked service, and then read the data through "synfs" protocol.
1. Create a linked service
In synapse, go to linked -> new external data -> azure blob storage gen2
In the config, leave integration runtime to "AutoresolveIntegrationRuntime". Select Authentication type "Service Principal" if available, or "Account Key" if using key. Put in the URL as https://<storage account name>.dfs.core.windows.net. Put in authtication reference method e.g. "tenant", "service principal id" (application id) and "service principal key" (secret).
Then test connection OK. However, when referesh the page. The new linked service may fail, because it tries to connect using your current account. So ignore it.
2. Create a directory for mounting to
from notebookutils import mssparkutils
mssparkutils.fs.mkdirs('/mount')
3. Mount the linked service
from notebookutils import mssparkutils
container_name ='container_name'
account_name = 'storage_account_name'
mssparkutils.fs.mount(
f"abfss://{container_name}@{account_name}.dfs.core.windows.net",
"/mount",
{"linkedService":"<linked service name>", "fileCacheTimeout": 30, "timeout": 30}
)
This code should return true. Note you can't just run ls() command for the mount folder. It doesnt work that way.
4. Check the mount folder
path = mssparkutils.fs.getMountPath("/mount")
path
The path is something like '/synfs/5/mount'. This means the mount folder is available now.
5. Access mount folder
mssparkutils.fs.ls("synfs:/5/mount/")
Note the path from above is converted to a different format with 'synfs' as the protocol. This command should show the files under the mount folder.
6. Read data through spark
%%pyspark
df = spark.read.load("synfs:/5/mount/your sub directory/", format='parquet')
df.limit(10).show()
Note pandas doesnt seem to work with the synfs protocol.
7. Unmount folder
mssparkutils.fs.unmount("/mount")