Skip to content

Integrating Perseus with Training Frameworks

Warning

Perseus is under active development, and breaking changes may happen. Currently, we have all the low-level APIs in place, but it's not a turnkey solution yet. This document always reflects the master HEAD.

This page aims to walk you through the process of integrating Perseus with arbitrary training frameworks. We also have a reference integration with Merak. Especially take a look at Merak.runtime.pipe_engine.

Assumptions

We assume that there are concrete regions of the framework's code where the forward pass and the backward pass exclusively happens. For instance, in DeepSpeed, PipelineEngine has _exec_forward_pass and _exec_backward_pass. As another example, in Megatron-LM, users can pass in their custom forward_step_func to pretrain, and forward_step in the codebase calls it. The backward pass is done (roughly) in the backward_step function.

Integrate PerseusOptimizer

  1. Instantiate the PerseusOptimizer somewhere before actual training runs. Let's call the object opt.
  2. Surround one training step with opt.on_step_begin() and opt.on_step_end().
  3. Wrap the forward pass region with opt.on_instruction_begin("forward") and opt.on_instruction_end("forward").
  4. Wrap the backward pass region with opt.on_instruction_begin("backward") and opt.on_instruction_end("backward").

That's it.

Profiling Instructions

It's important to optimize on top of accurate measurements of forward and backward instructions. For now, we're taking an offline approach, where we run each instruction under a given GPU frequency N times and average time and energy consumption. See Merak's profile function.

We're on the process of implementing an online approach that is directly integrated into PerseusOptimizer so that you don't need to implement a separate profiler inside your framework.