is a GPU scheduler for elastic deep learning workloads based on Kubernetes, Kubeflow Training Operator and Horovod.
It offers a system for scheduling elastic training jobs, as well as a number of state-of-the-art scheduling algorithms, that are designed specifically for elastic training to reduce job completion time and increase cluster efficiency. The scheduling algorithms can also be customized at ease.
See the github page for more details.
Voda scheduler is designed to be cloud-native and adopts microservices architecture, consisting of several loosely coupled components. It leverages several existing open-sourced projects for reliability, flexibility, and maintainability.