Interact with the Microsoft Graph API using Synapse Analytics – Part I

Introduction

In this and the next post I want to show you how to connect to the Microsoft Graph API, request some data, process it and store it in a database using Synapse Analytics. 

This first post presents a sample use case, briefly introduces the Graph API, how to create a linked service to it, and how to start querying data. In the next post a sample  Synapse pipeline will be described. The pipeline grabs some data and copies it into some target tables. Finally, I will create a sample query to showcase the newly imported data. 

Use case 

Working with a customer in a large organization, I was dealing with a traditional security setup where access control for database objects, a SQL dedicated pool, was implemented using database custom roles, database users from external providers (AAD) and AAD groups. If a user requests access to a particular database view or schema, this user is added to the corresponding AAD group, which is created as a user in the target database. This user has been added to one or more roles, which has granted some privileges for the particular database objects (in this case select permissions over views).

After some time, it was required to prepare a list of AAD groups and their members in a queryable format in order to create reports and implement additional access control mechanisms. 

Access Control Setup

Graph API intro

“Microsoft Graph is a RESTful web API that enables you to access Microsoft Cloud service resources” https://learn.microsoft.com/en-us/graph/use-the-api

This API is a gateway to access large amounts of data in Microsoft 365, Windows, and Enterprise Mobility + Security. You can access your Azure Active Directory resources to enable different use cases. 

To access this API from the Synapse Workspace, the managed identity of the workspace must have User.Read.All permission. You can configure it in the Azure Portal, AAD, Enterprise Applications, and then search for the managed identity of the synapse workspace. 

For additional information please visit:

In order to learn and try out different request you can use this wonderful Graph Explorer: https://developer.microsoft.com/en-us/graph/graph-explorer

There is a selector for the request method, next to it you can select the version of the API and enter the request URL. On the top-right there is a button with the user icon you have to use to sign in. On the left you have a menu with different sample queries broken down by categories. 

Sample Query

Let’s say we want to query:

  • AAD Groups
  • Only groups starting by A_DB_
  • Only select the attributes: id, displayName and description

The query looks like this:

https://graph.microsoft.com/v1.0/groups?$filter=startswith(displayName,’A_DB_’)&$select=id,displayName,description

Please refer to the following pages to understand the query above:

After pasting the query in the graph explorer I got this response:

API Response

Linked Service Configuration

  • Navigate to manage, linked services and click on “+ New”

Entered the required values as follows:

  • After that, test your connection and click on “Create”.

Generic Dataset

In order to reuse the REST Linked Service a generic dataset with a parameter to point to different resources of the Graph API.

  • Go to Data tab – > Linked → Integration datasets … → New integration dataset

  • Select REST as the dataset type and click on “Continue”:

  • Provide a name and select the linked service we created previously and press ‘Ok’:

  • In the newly created dataset select the “Parameters” tab and create the following entry:

  • Go back to the Connection Tab of the dataset, select the “Relative URL” field and click on “Add dynamic content”

  • In the pipeline expression builder select the parameter we created previously:

  • Now the dataset should look like this:

  • Click on “Test connection”. It should succeed.

Data Pipeline

  • Create a new pipeline. I named it PIP_MS_Graph.
  • Add a Copy data activity and name it

  • In the “Source” tab let’s configure our query using the dataset parameter we created in a previous step:

  • Query: In the query I’m using the sample query presented before in the article. 
  • The Request method is GET
  • For Pagination rules, if the response contains more than 25 records the API paginates them and in every response the link to the next chunk is provided (default value, can be changed).
  • To showcase additional functionality, let’s add two additional columns, Load_TS to timestamp the incoming data and Has_Members, to initialize a column we are going to populate in a later step.
  • If you press on “Preview data” you should see something similar to this:

As target, let’s use for the part I just a parquet dataset in a storage account:

And now in the Copy activity Sink:

The Mappings could be tricky and should be handle with care:

  • The collection reference must be set to the [values] array in the response.
  • Then the attributes inside this array are configured as child objects using just their name.
  • Finally, the additional columns are added at the same level of the values.
  • The target columns are configured.

Note: Use the advanced editor if you want to see the target attribute names in the drown-downs for the mappings. This could be a current bug. You can also see the expression used to select the incoming fields. Now you can run the pipeline and check your results.

Summary

In this post I introduced the Microsoft Graph API, how you can use it to access AAD data and build a sample query, connect a Synapse Workspace to the Graph API and retrieve a list of groups to populate a parquet dataset. In the next post I will explain a couple of additional details to effectively use the API and play a bit more with AAD user, groups and memberships. Hope you find it useful. 

2 responses to “Interact with the Microsoft Graph API using Synapse Analytics – Part I”

  1. […] ← Interact with the Microsoft Graph API using Synapse Analytics – Part I […]

Leave a comment