Project : Tracking Model training with MLflow
This project falls under the jurisdiction of AUTODESK CONSTRUCTION SOLUTIONS intelligence team. The project is about the ability to track model training and means to trigger training a new model when new data is available. This project makes the newly available labelled data available for training new model.
This project entailed the use of technologies like python as coding language , mlflow services like mlflow tracking , mlflow ui, mlflow models, mflow model registry for training and tracking the runs and storing artefacts and aws services like aws ec2, aws ecr, aws s3 , aws fargate , aws vpc , aws iam, aws cloudformation , aws sagemaker training jobs for creating mlflow deployment server.
Docker services like docker images , docker containers to create images , shell scripting to create DockerFile . Html , css, javascript , bootstrap for front end user interface to run training job and Flask to run the web application. Git for continuous integration.
Main Phases Implemented
Logging local training runs to local MLFlow tracking server:
Already existing ML model is integrated with mlflow python sdk and the run details , parameters , metrics and artefacts are recorded using the mlflow module and run locally.
Logging local training runs to hosted MLFlow tracking server :
Mlflow server is hoisted in aws fargete services using docker mlflow server image stored in aws ecr and making run data stored in aws s3.
Creating training job:
After hoisting the mlflow server we created a sagemaker training job using a self -built docker image which contains all the ML code . The way the model works is it takes s3 path to new data from container environment variables and the model version that we want to take old data which is stored as an artefact and combine both data and send them for training. The model versioning is achieved by storing the current model as an artefact and registering the model with some version . MLflow also supports the features of keeping models in staging and in production.
Additional Step:
As part of an additional step we created a web application using html , css , python, and flask for the demonstration of the project. Though the trigger is implemented as a web service for now it can be implemented as a lambda or can be a part of a bigger training pipeline.