Integrating Perseus with Training Frameworks
Perseus is under active development, and breaking changes may happen.
Currently, we have all the low-level APIs in place, but it's not a turnkey solution yet.
This document always reflects the master
This page aims to walk you through the process of integrating Perseus with arbitrary training frameworks.
We also have a reference integration with Merak.
Especially take a look at
We assume that there are concrete regions of the framework's code where the forward pass and the backward pass exclusively happens.
For instance, in DeepSpeed,
As another example, in Megatron-LM, users can pass in their custom
forward_step in the codebase calls it. The backward pass is done (roughly) in the
- Instantiate the
PerseusOptimizersomewhere before actual training runs. Let's call the object
- Surround one training step with
- Wrap the forward pass region with
- Wrap the backward pass region with
It's important to optimize on top of accurate measurements of forward and backward instructions.
For now, we're taking an offline approach, where we run each instruction under a given GPU frequency N times and average time and energy consumption.
We're on the process of implementing an online approach that is directly integrated into
PerseusOptimizer so that you don't need to implement a separate profiler inside your framework.