An Energy Optimization Framework for DNN Training
Join the Zeus Slack workspace!
Zeus automatically optimizes the energy and time of recurring DNN training jobs by finding the optimal batch size and GPU power limit.
Zeus is part of The ML.ENERGY Initiative.
Refer to Getting Started for instructions on environment setup, installation, and integration. We also provide integration examples:
- Integrating Zeus with Computer Vision
- Integrating Zeus with Natural Language Processing and Huggingface
- Running trace-driven simulation on single recurring jobs and the Alibaba GPU cluster trace
You can easily implement custom policies for batch size and power limit optimization and plug it into Zeus.
Refer to Extending Zeus for details.