Oct 2021 - Now
Improved 80% efficiency and reduced manual intervention by designing and building ETL workflows and batch processing pipelines from scratch for AI products using Airflow and Python.
Reduced data processing time by 50% through implementing incremental data loading to optimize ETL processes.
Reduced data errors by 75% through implementing data validation to optimize data pipelines.
Designed and built a deep learning pipeline from scratch to monitor embryos using Python, MLflow, Hydra, Pytorch Lightning.
Reached 99.9% accuracy, comparable to SOTA.
Improved 10% accuracy in the more challenging scenario by further introducing new learning methodologies.
Improved a deep/machine learning pipeline to conduct embryo selection using Python, MLflow, Hydra, Tensorflow.
Increased 10% AUC and generalizability by combining heterogeneous data.
Improved 2x data privacy and security by introducing decentralized learning.
Reduced 66% manual intervention by building a one-command decentralized training pipeline using Ansible.
Built and maintained AI services of previous pipelines using FastAPI, Tensorflow Serving, MariaDB, MongoDB, Redis, Docker and Docker Compose.
Edited and wrote 3 papers for journal publication: Using an Interpretable Machine Learning Model to Predict Corifollitropin Alfa Protocol, and 2 WIP.
Aug 2020 - Jan 2021
Increased 2x processing speed and improved 50% data accuracy by designing and building ETL workflows using Google Apps Script / Python.
Designed a data warehouse for the data analysis team.
Designed and built Power BI dashboards and KPIs.
Evaluated machine learning algorithms for specific projects.
E-commerce data pipeline
Developed a batch processing pipeline using Python, Airflow, Spark, AWS S3, AWS EMR, AWS Redshift and Terraform.
Built a real-time recommendation service using Elasticsearch and FastAPI.
Sep 2013 - Jun 2017
Oct 2021 - Now
Improved 80% efficiency and reduced manual intervention by designing and building ETL workflows and batch processing pipelines from scratch for AI products using Airflow and Python.
Reduced data processing time by 50% through implementing incremental data loading to optimize ETL processes.
Reduced data errors by 75% through implementing data validation to optimize data pipelines.
Designed and built a deep learning pipeline from scratch to monitor embryos using Python, MLflow, Hydra, Pytorch Lightning.
Reached 99.9% accuracy, comparable to SOTA.
Improved 10% accuracy in the more challenging scenario by further introducing new learning methodologies.
Improved a deep/machine learning pipeline to conduct embryo selection using Python, MLflow, Hydra, Tensorflow.
Increased 10% AUC and generalizability by combining heterogeneous data.
Improved 2x data privacy and security by introducing decentralized learning.
Reduced 66% manual intervention by building a one-command decentralized training pipeline using Ansible.
Built and maintained AI services of previous pipelines using FastAPI, Tensorflow Serving, MariaDB, MongoDB, Redis, Docker and Docker Compose.
Edited and wrote 3 papers for journal publication: Using an Interpretable Machine Learning Model to Predict Corifollitropin Alfa Protocol, and 2 WIP.
Aug 2020 - Jan 2021
Increased 2x processing speed and improved 50% data accuracy by designing and building ETL workflows using Google Apps Script / Python.
Designed a data warehouse for the data analysis team.
Designed and built Power BI dashboards and KPIs.
Evaluated machine learning algorithms for specific projects.
E-commerce data pipeline
Developed a batch processing pipeline using Python, Airflow, Spark, AWS S3, AWS EMR, AWS Redshift and Terraform.
Built a real-time recommendation service using Elasticsearch and FastAPI.
Sep 2013 - Jun 2017