Databricks cluster - run init script
examples: Install microsoft sql server odbc driver on databricks
Sometimes the installation of a library requires more than just passing the library name in the 'libraries' section to install.
You may need to run extra linux shell script to complete the installation, to change some conf settings.
Note a databricks cluster's compute is not persisted. It installs libraries and runs init scripts as it spins up the instance.
Set up Init Scripts
Click into a compute and Edit
under the Configuration table and the Advance options section, go to "Init Scripts"
Point to the shell script (.sh file) in either workspace / volume, and Add
Done
As an example shell script file to install SQL Server ODBC driver is as below.
It fetches the repo from microsoft website, updates the apt-get and install the driver.
#!/bin/bash
curl https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
curl https://packages.microsoft.com/config/ubuntu/$(lsb_release -rs)/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18
sudo apt-get clean
When dealing with SQL server 2008 in python, the odbc connection could complain about errors like:
in pyodbc "SSL routines::unsupported protocol"
in pymssql "DB-Lib error message, Adaptive Server connection failed"
That's because the TSL (SSL) protocol is not supported in the old version of sql server.
One work around is to downgrade the security setting in the linux cluster.
In the /etc/ssl/openssl.cnf file
...
[system_default_sect]
CipherString = DEFAULT@SECLEVEL=2
Simply change the "SECLEVEL=2" to "SECLEVEL=0". This can be done through init script:
#!/bin/bash
sudo sed -i "s/CipherString = DEFAULT:@SECLEVEL=2/CipherString = DEFAULT:@SECLEVEL=0/" /etc/ssl/openssl.cnf
In the sed command, -i for in-place string replacement, the s is for substitute, the / is delimiter, the first string is for matching, the second string is the replacement.
sed -i 's/string1/string2/' filepath
Then in the python script, use 'Encrypt=no' in the connection string.
import pyodbc
conn = pyodbc.connect('DRIVER={ODBC Driver 18 for SQL Server};SERVER=xxx;DATABASE=xxx;UID=xxx;PWD=xxx;Encrypt=no')