Integrating Perseus with Training Frameworks
Warning
Perseus is under active development, and breaking changes may happen.
Currently, we have all the low-level APIs in place, but it's not a turnkey solution yet.
This document always reflects the master HEAD
.
This page aims to walk you through the process of integrating Perseus with arbitrary training frameworks.
We also have a reference integration with Merak.
Especially take a look at Merak.runtime.pipe_engine
.
Assumptions
We assume that there are concrete regions of the framework's code where the forward pass and the backward pass exclusively happens.
For instance, in DeepSpeed, PipelineEngine
has _exec_forward_pass
and _exec_backward_pass
.
As another example, in Megatron-LM, users can pass in their custom forward_step_func
to pretrain
, and forward_step
in the codebase calls it. The backward pass is done (roughly) in the backward_step
function.
Integrate PerseusOptimizer
- Instantiate the
PerseusOptimizer
somewhere before actual training runs. Let's call the objectopt
. - Surround one training step with
opt.on_step_begin()
andopt.on_step_end()
. - Wrap the forward pass region with
opt.on_instruction_begin("forward")
andopt.on_instruction_end("forward")
. - Wrap the backward pass region with
opt.on_instruction_begin("backward")
andopt.on_instruction_end("backward")
.
That's it.
Profiling Instructions
It's important to optimize on top of accurate measurements of forward and backward instructions.
For now, we're taking an offline approach, where we run each instruction under a given GPU frequency N times and average time and energy consumption.
See Merak's profile
function.
We're on the process of implementing an online approach that is directly integrated into PerseusOptimizer
so that you don't need to implement a separate profiler inside your framework.