Microsoft Graph API

Download files from sharepoint site

1. Enable Graph API

On Azure Portal -> Application Registration

Create a new registration, this gives client_id, tenant_id, etc

Under the Certificates&Secrets of the application, create a new client secret

save the secret, this together with the client id, tenant id will be used for connecting to the api

Under the API permission, add permissions

There are a bunch of permissions available within the add permission section.

Choose Microsoft Graph.

There two types of permission to choose, delegated and application permissions.

The application permission once added, needs to be 'Grant admin consent'. It's global and without user context.

The application will have the permission to access the corresponding resources.

The delegated permission needs a user context. Once added to the application registration, it still requires the end user to log in to be able to access the corresponding resources under the user's own context.

For example, for listing all teams channels, the application permission allows listing all channels within the organization.

The delegated permission allows only listing the channels that the user has access to.

Choose the permissions to be granted.

e.g. Sites.read.all, User.read.all, etc.

The graph api can connect to a bunch of microsoft products, sharepoint, onedrive, outlook, ad groups, etc.

each of them has it's only permission, so grant the permissions required for the application registration.

2. Authenticate with Microsoft Authority

This connects to the Microsoft authority https://login.microsoftonline.com and requests for a Bearer token.

With the bearer token, user can further access the graph API

For application permission, it only needs to authenticate with the client id, client secret generated in the App Registration.

For delegated permission, it needs to authenticate with the username, password / interactive login / multi-factor login

2.1. For application permission

import requests

# Microsoft Azure AD app (client) credentials

client_id = '<your_client_id>'

client_secret = '<your_client_secret>'

tenant_id = '<your_tenant_id>'

# Scopes required by the application (application permissions)

scopes = ['https://graph.microsoft.com/.default']

# Get an access token using client credentials flow

token_url = f'https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token'

payload = {

'client_id': client_id,

'client_secret': client_secret,

'scope': ' '.join(scopes),

'grant_type': 'client_credentials'

}

response = requests.post(token_url, data=payload)

# Obtain the access token

access_token = response.json()['access_token']

2.2. Delegated permission

There are three ways to log in, ie. username password, interactive (pops up a login page), multi-factor login (pops up a login page and with authenticator code).

For interactive and multi-factor login, the MSAL library needs to be installed. pip install msal.

Client_id and tenant_id are required for connecting to the right app registration, but client_secret is not needed, instead the user would need to log in.

import requests

import msal

# Microsoft Azure AD app (client) credentials

client_id = 'xxx'

tenant_id = 'xxx'

# Authority for the tenant

authority = f'https://login.microsoftonline.com/{tenant_id}'

# Scopes required by the application (delegated permissions)

# Ensure that your Azure AD application is configured with the appropriate delegated permissions

# (such as User.Read and Sites.Read.All) in the Azure portal

scopes = ['User.Read', 'Sites.Read.All']

############ interactive log in ############

# Initialize the MSAL public client application

app = msal.PublicClientApplication(

client_id,

authority=authority

)

# Initialize the login flow

result = app.acquire_token_interactive(scopes)

############ username password log in ############

# Initialize the MSAL confidential client application

app = msal.ConfidentialClientApplication(

client_id,

authority=authority

)

# Acquire token using username/password

result = app.acquire_token_by_username_password(username='<user_email>', password='<user_password>', scopes=scopes)

############ multi-factor log in ############

# Initialize the MSAL public client application

app = msal.PublicClientApplication(

client_id,

authority=authority

)

# Initialize the login flow with device code

flow = app.initiate_device_flow(scopes=scopes)

if 'user_code' not in flow:

print('Failed to create device flow:', flow.get("error_description"))

exit()

print('Please authenticate using the code:', flow['user_code'])

print('Visit', flow['verification_uri'], 'and enter the code to authenticate.')

result = app.acquire_token_by_device_flow(flow)

# Obtain the access token

access_token = result['access_token']

3. Call Graph API

Here uses an example to list all the sharepoint sites.

Simply use the graph api endpoint (can be found on Microsoft Graph API reference)

headers = {

'Authorization': 'Bearer ' + access_token, #the token from the previous step

'Content-Type': 'application/json'

}

# Microsoft Graph API endpoint

graph_endpoint = 'https://graph.microsoft.com/v1.0/sites'

# Make a GET request to list all sites

response = requests.get(graph_endpoint, headers=headers)

print(response.text)

4. Download file from sharepoint list

There are multiple steps to achieve this.

Make sure enough permissions have been granted. Some sharepoint sites are actually a teams channel so teams and channels permissions are also needed.

Typical permissions: Sites.Read.All, TeamSettings.Read.All, Directory.Read.All, Group.Read.All, Channel.ReadBasic.All, Files.read.ALL

Firstly list all the sites, so as to find the id of the sharepoint site you are after.

url = "https://graph.microsoft.com/v1.0/sites/"

r = requests.get(

url = url,

headers={'Authorization': 'Bearer %s' % token},

)

for i in r.json()['value']:

print(i['displayName'] + ' --- ' + i['id'])

The ID field looks like "myorg.sharepoint.com,4444555566-xxxx-yyyy-zzzz-8888ac9e7777,123456789-xxxx-yyyy-zzzz-111199c44444"

The second element in the delimited ID string is the site id.

Secondly list all the drives associated with the sharepoint site.

site_id = '4444555566-xxxx-yyyy-zzzz-8888ac9e7777'

url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"

r = requests.get(

url = url,

headers={'Authorization': 'Bearer %s' % token}, #the bearer token from previous step

)

for i in r.json()['value']:

print(i['name'] + ' --- ' + i['id'])

Once have the drive id, query the children items within the drive.

Note the root item is used in the api url.

site_id = '4444555566-xxxx-yyyy-zzzz-8888ac9e7777'

drive_id = 'b!cXXXXXXXXXXXXXXXXX'

url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/root/children"

r = requests.get(

url = url,

headers={'Authorization': 'Bearer %s' % token},

)

for i in r.json()['value']:

print(i['name'] + ' --- ' + i['id'])

If the item is a folder, then check the children files of the folder with folder(item) id.

Note the URL path goes from drive to items and to children.

site_id = '4444555566-xxxx-yyyy-zzzz-8888ac9e7777'

drive_id = 'b!cXXXXXXXXXXXXXXXXX'

item_id = '01xxxxxxxxxxxxxxxx'

url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{item_id}/children"

r = requests.get(

url = url,

headers={'Authorization': 'Bearer %s' % token},

)

Again from the response, get the children files and ids.

Pick the id of the file to be downloaded.

Finally with the file(item) id, download the content as binary string.

Note the API URL can go from the drive directly to the child file item, without going through the folder item first.

The 'content' in the URL refers to the binary content of the file.

site_id = '4444555566-xxxx-yyyy-zzzz-8888ac9e7777'

drive_id = 'b!cXXXXXXXXXXXXXXXXX'

child_item_id = '01xxxxxxxxxxxxxxxx' #the file item id

url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives/{drive_id}/items/{child_item_id}/content"

r = requests.get(

url = url,

headers={'Authorization': 'Bearer %s' % token},

)

The response carries the binary string as r.content.

Now you can write the binary string into a file, or read it into pandas

with open('c:/temp/test_download_sharepoint_file.xlsx', 'wb') as f:

f.write(r.content)

import pandas as pd

dfs = pd.read_excel(r.content, sheet_name = None) #read all sheets

Page updated

Google Sites

Report abuse