Voda Scheduler

Avatar of 謝宗佐.
Avatar of 謝宗佐.

Voda Scheduler

Senior Software Engineer
Hsinchu County, Taiwan

Voda Scheduler

is a GPU scheduler for elastic deep learning workloads based on Kubernetes, Kubeflow Training Operator and Horovod.


It offers a system for scheduling elastic training jobs, as well as a number of state-of-the-art scheduling algorithms, that are designed specifically for elastic training to reduce job completion time and increase cluster efficiency. The scheduling algorithms can also be customized at ease.


https://github.com/heyfey/vodascheduler

Features

  • Rich Scheduling Algorithms (with Resource Elasticity)
  • Topology-Aware Scheduling & Worker Migration
  • Microservices architecture
  • Node addition/deletion awareness
  • Heterogeneous scheduling
  • Fault-Tolerance


See the github page for more details.

Architecture Overview

Voda scheduler is designed to be cloud-native and adopts microservices architecture, consisting of several loosely coupled components. It leverages several existing open-sourced projects for reliability, flexibility, and maintainability.



Voda scheduler is a GPU scheduler for elastic/distributed deep learning workloads based on Kubernetes, Kubeflow Training Operator and Horovod. Features: - Rich Scheduling Algorithms (with Resource Elasticity) - Topology-Aware Scheduling & Worker Migration - Microservices architecture - Node addition/deletion awareness - Heterogeneous scheduling - Fault-Tolerance https://github.com/heyfey/vodascheduler
Avatar of the user.
Please login to comment.

Published: Oct 18th 2021
90
7
0

Tools

tensorflow
TensorFlow
python
Python
go
Go
mongodb
MongoDB
docker
Docker
kubernetes
Kubernetes

Golang
MLOps
distribued systems
kubeflow
scheduling
tensorflow
deep learning
machine learning
kubernetes

Share