Users can implement custom policies to optimize batch size and power limits, and plug it into Zeus.
Zeus defines two abstract classes
Each class optimizes the batch size and power limit of a recurring training job respectively.
As in our paper, the batch size optimizer is first invoked to decide which batch size to use, and then the power limit optimizer is invoked with both the job and the batch size chosen to decide which power limit to use.
You can find examples of policy implementations in
Plugging it into Zeus
There are two ways to run Zeus: trace-driven and end-to-end.
There are two central components in end-to-end Zeus:
The former takes charge of driving the entire optimization over recurring jobs, and accepts an instance of
BatchSizeOptimizer in its constructor.
The latter takes charge of JIT-profiling power in the background, determining the optimal power limit, and setting it.
Hence, the functionality of
JITPowerLimitOptimizer is already tightly integrated into
Users will have to implement their own
ZeusDataLoader in order to test another