It is indisputable true that mlflow came to make life a lot easier not only for data scientists but also for data engineers, architects among others. There is a very helpful list of tutorials and example in the official mlflow docs. You can just download it, open a console and start using it locally on your computer. This is the fastest way to getting started. However, as soon as you progress and introduce mlflow in your team, or you want to use it extensively for yourself, some components should be deployed outside your laptop.
To exercise a deployment setup and since I own azure experience, I decided to provision a couple of resources in the cloud to deploy the model registry and store the data produced by the tracking server.
The code used during the article is available on github:
When I finished the diagram below, I noticed the code is located in the middle of everything. However, the code usually is developed locally. Data science teams must go beyond notebooks and operationalize their code. This will enable the integration with applications to deliver its value to end users and machines.
The tracking server is basically an API and UI. With the API you can logged parameters, code version, metrics and artifacts. The you can use the UI to query and visualize the experiment results. Experiments are a set of runs, and a run is the execution of a piece of code. The values from the experiments are recorded by default locally in a folder named mlruns in the directory where you call your code as can be seen in the following figure:
The results above can also be stored in a SQL Alchemy compatible database. The place where you store this data is called the backend store. In this example I used an Azure SQL Database. The details are described in the next sections.
The clients running experiment stores their artifacts output, i.e., models, in a location called the artifact store. If nothing is configured mlflow uses by default the mlruns directory as shown in the next figure:
This location should be able to handle large amounts of data. Some different popular cloud providers storage services are supported. In this example Azure Blob Storage is used.
A project is just a directory, in this example a git repository, where a descriptor file is placed to specify the dependencies and how the code is executed.
This module offers a way to unify the deployment of machine learning models. It defines a convention in order to package and share your code.
This is one of my favorite modules and is a centralized model repository with a UI and a set of APIs for model lifecycle management. If you run your own MLflow server, a database-backend must be configured. In this example an Azure SQL Database.
Preparing a docker image for the tracking server
One important thing is to make your work shareable and reusable. I really like docker containers because they help me to achieve that. You can run them locally and also easily deploy them in different ways on different cloud providers.
This docker image is created from a python image. The rest is quite simple, just a couple of environment variables, install the required python packages and define an entry point. Unfortunately, as usual, when you start getting away from the default configurations, things get complicated.
This docker image now must be able to connect to an Azure SQL Database using python. There are at least to major packages to achieve that. On is pymssql which seems to be the old way and has some limitation to work with Azure. The other is pyodbc.
The next step is to add pyodbc to the requirements.txt file. But that was not all. In order to work, pyodbc needs the ODBC drivers installed on the image. The new image added the SQL Server ODBC driver 17 for Ubuntu 18.04.
Last thing was to update the requirements file as follows:
The entry point is the script startup.sh which a modified as follows:
mlflow server --backend-store-uri "$BACKEND_URI" --default-artifact-root "$MLFLOW_SERVER_DEFAULT_ARTIFACT_ROOT" --host 0.0.0.0
You can find the upgraded code in my github repo.
Once you have downloaded the code just build the image. For instance, using your console, change the directory to the one with the DockerFile and issue:
docker build -t mlflowserver -f Dockerfile . --no-cache
Using blob storage for the tracking server artifact store
AS explain in the architecture overview, an Azure Blob Storage account was crated for the artifact backend. To configure it, you just need to set environment variable AZURE_STORAGE_ACCESS_KEY as follows:
Of course, first create an azure storage account and a container. I create a container named mlflow as shown in the following figure:
And then my environment variable became:
And to access the container from outside just set the storage account connection string environment variable:
AZURE_STORAGE_CONNECTION_STRING = <your azure storage connection string>
Using SQL server for the backend store
I created a serverless Azure SQL Database. A nice thing for testing and prototyping. If you want to change to another pricing model just configure another pricing tier.
From the SQL Server instance I need a user that can create and delete objects. I have not found exactly which permissions this user needs in the documentation but at least it should be able to create and drop tables, foreign keys and constraints. To be honest here, I just used the admin user. I need to investigate a bit deeper on this. When you already have your instance, user and password, you can build your connection string and also assign it to an environment variable as follows:
BACKEND_URI="mssql+pyodbc://<sqlserver user>:<password>@<your server>.database.windows.net:1433/<database name>?driver=ODBC+Driver+17+for+SQL+Server"
In order to test it I used the sklearn_elasticnet_wine example from the mlflow tutorial: Train, serve, and score a linear regression model
It is enough to change a couple of lines in the code to use the tracking server we created:
- Set the tracking server URL, in my case I ran the docker container locally
- Set the experiment passing its name as argument. If the experiment doesn’t exist it gets created
- Get the experiment Id
- Assign the experiment Id to the run
I left everything else as it was.
Now it is time to open the console and run our experiment.
Hint: remember to set the environment variable AZURE_STORAGE_CONNECTION_STRING where you execute the code.
The examples have several python requirement files you need to install depending on the tutorial you want to run. To simplify this I just wrote down my conda environment to a file on the folder “mlflow\examples\sklearn_elasticnet_wine”.
You can easily create a new conda environment using this file issuing:
conda create --name <env-name> --file mlflow\examples\sklearn_elasticnet_wine \requirements.txt
Time to execute the train.py script, from the root directory. I used different input values for the parameters alpha and l1_ratio, starting with 1 and 1:
Visualize experiment results using the tracking server UI
If you open the UI of the tracking server using your favorite browser you can visualize the experiment results:
If you click on the start time you can open a single run and track code, versions, parameters, metrics and artifacts:
If you scroll down to the bottom you can inspect the artifacts:
We can also verifiy the backend store tables are created in the azure SQL database instance:
For a complete description please refer to the official documentation using the link provide at the beginning of the post.
Deploy the model
If you are still not excited, now comes a very interesting part. Models cannot just stay on your laptop, you need to serve them somehow to applications and integrate them with other software pieces. Deploying the models to a web server as REST APIs is an excellent option to expose them as services.
To demonstrate mlflow deployment capabilities let’s deploy a REST server locally using:
mlflow models serve -m wasbs://firstname.lastname@example.org/0/866a64d8b7de488e83b985bd89d84afe/artifacts/model -p 1234
You need to replace the model location with the actual one. I found it in my previous screenshot:
Here we go:
The server is now running. Since I really like Postman, let´s just test the service with it. I will use the same input data as in the tutorial, which is a JSON-serialized pandas DataFrame:
Voila, that´s it. Now we can score incoming data doing a REST call!
To get completely away from local development, a vm, docker instance, or another service should be provisioned to run the mlflow docker container.
Also the REST server we created at the end should be deploy outside a laptop.
Once all infrastructure is already provisioned in the cloud, it would be very helpful to have an ARM template to be able to easily replicate and version the complete environment.