Measuring Energy
Important
This page assumes that your environment is already set up. Please refer to the Getting Started guide if not.
Zeus makes it very easy to measure time, power, and energy both programmatically in Python and also on the command line. Measuring power and energy is also very low overhead, typically taking less than 10 ms for each call.
Programmatic measurement
ZeusMonitor
makes it very simple to measure the GPU time and energy consumption of arbitrary Python code blocks.
A measurement window is defined by a code block wrapped with begin_window
and end_window
.
end_window
will return a Measurement
object, which holds the time and energy consumption of the window.
Users can specify and measure multiple measurement windows at the same time, and they can be arbitrarily nested or overlapping as long as they are given different names.
from zeus.monitor import ZeusMonitor
if __name__ == "__main__":
# All GPUs are measured simultaneously if `gpu_indices` is not given.
monitor = ZeusMonitor(gpu_indices=[torch.cuda.current_device()])
for epoch in range(100):
monitor.begin_window("epoch")
steps = []
for x, y in train_loader:
monitor.begin_window("step")
train_one_step(x, y)
result = monitor.end_window("step")
steps.append(result)
mes = monitor.end_window("epoch")
print(f"Epoch {epoch} consumed {mes.time} s and {mes.total_energy} J.")
avg_time = sum(map(lambda m: m.time, steps)) / len(steps)
avg_energy = sum(map(lambda m: m.total_energy, steps)) / len(steps)
print(f"One step took {avg_time} s and {avg_energy} J on average.")
This monitor spawns a process that polls the instantaneous GPU power consumption API and exposes two methods: get_power
and get_energy
.
For GPUs older than Volta that do not support querying energy directly, ZeusMonitor
automatically uses the PowerMonitor
internally.
Use of global variables on GPUs older than Volta
On GPUs older than Volta, you should not instantiate ZeusMonitor
as a global variable without protecting it with if __name__ == "__main__"
.
It's because the energy query API is only available on Volta or newer NVIDIA GPU microarchitectures, and for older GPUs, a separate process that polls the power API has to be spawned (i.e., PowerMonitor
).
In this case, global code that spawns the process should be guarded with if __name__ == "__main__"
.
More details in Python docs.
gpu_indices
and CUDA_VISIBLE_DEVICES
Zeus always respects CUDA_VISIBLE_DEVICES
if set.
In other words, if CUDA_VISIBLE_DEVICES=1,3
and gpu_indices=[1]
, Zeus will understand that as GPU 3 in the system.
gpu_indices
and optimization
In general, energy optimizers measure the energy of the GPU through a ZeusMonitor
instance that is passed to their constructor.
Thus, only the GPUs specified by gpu_indices
will be the target of optimization.
Synchronizing CPU and GPU computations
Deep learning frameworks typically run actual computation on GPUs in an asynchronous fashion. That is, the CPU (Python interpreter) asynchronously dispatches computations to run on the GPU and moves on to dispatch the next computation without waiting for the GPU to finish. This helps GPUs achieve higher utilization with less idle time.
Due to this asynchronous nature of Deep Learning frameworks, we need to be careful when we want to take time and energy measurements of GPU execution.
We want only and all of the computations dispatched between begin_window
and end_window
to be captured by our time and energy measurement.
That's what the sync_execution_with
paramter in ZeusMonitor
and sync_execution
paramter in begin_window
and end_window
are for.
Depending on the Deep Learning framework you're using (currently PyTorch and JAX are supported), ZeusMonitor
will automatically synchronize CPU and GPU execution to make sure all and only the computations dispatched between the window are captured.
Tip
Zeus has one function used globally across the codebase for device synchronization: sync_execution
.
Warning
ZeusMonitor
covers only the common and simple case of device synchronization, when GPU indices (gpu_indices
) correspond to one whole physical device.
This is usually what you want, except when using more advanced device partitioning (e.g., using --xla_force_host_platform_device_count
in JAX to partition CPUs into more pieces).
In such cases, you probably want to opt out from using this function and handle synchronization manually at the appropriate granularity.
CPU measurements using Intel RAPL
ZeusMonitor
supports CPU/DRAM energy measurement as well!
The RAPL interface for CPU energy measurement is available for the majority of Intel and AMD CPUs. DRAM energy measurement are available on some CPUs as well. To check support, refer to Verifying installation.
To only measure the energy consumption of the CPU used by the current Python process, you can use the get_current_cpu_index
function, which retrieves the CPU index where the specified process ID is running.
You can pass in cpu_indices=[]
or gpu_indices=[]
to ZeusMonitor
to disable either CPU or GPU measurements.
from zeus.monitor import ZeusMonitor
from zeus.device.cpu import get_current_cpu_index
if __name__ == "__main__":
# Get the CPU index of the current process
current_cpu_index = get_current_cpu_index()
monitor = ZeusMonitor(cpu_indices=[current_cpu_index], gpu_indices=[])
for epoch in range(100):
monitor.begin_window("epoch")
steps = []
for x, y in train_loader:
monitor.begin_window("step")
train_one_step(x, y)
result = monitor.end_window("step")
steps.append(result)
mes = monitor.end_window("epoch")
print(f"Epoch {epoch} consumed {mes.time} s and {mes.total_energy} J.")
avg_time = sum(map(lambda m: m.time, steps)) / len(steps)
avg_energy = sum(map(lambda m: m.total_energy, steps)) / len(steps)
print(f"One step takes {avg_time} s and {avg_energy} J for the CPU.")
CLI power and energy monitor
The energy monitor measures the total energy consumed by the GPU during the lifetime of the monitor process.
It's a simple wrapper around ZeusMonitor
.
$ python -m zeus.monitor energy
[2023-08-22 22:44:45,106] [ZeusMonitor](energy.py:157) Monitoring GPU [0, 1, 2, 3].
[2023-08-22 22:44:46,210] [zeus.utils.framework](framework.py:38) PyTorch with CUDA support is available.
[2023-08-22 22:44:46,760] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' started.
^C[2023-08-22 22:44:50,205] [ZeusMonitor](energy.py:329) Measurement window 'zeus.monitor.energy' ended.
Total energy (J):
Measurement(time=3.4480526447296143, energy={0: 224.2969999909401, 1: 232.83799999952316, 2: 233.3100000023842, 3: 234.53700000047684})
The power monitor periodically prints out the GPU's power draw.
It's a simple wrapper around PowerMonitor
.
$ python -m zeus.monitor power
[2023-08-22 22:39:59,787] [PowerMonitor](power.py:134) Monitoring power usage of GPUs [0, 1, 2, 3]
2023-08-22 22:40:00.800576
{'GPU0': 66.176, 'GPU1': 68.792, 'GPU2': 66.898, 'GPU3': 67.53}
2023-08-22 22:40:01.842590
{'GPU0': 66.078, 'GPU1': 68.595, 'GPU2': 66.996, 'GPU3': 67.138}
2023-08-22 22:40:02.845734
{'GPU0': 66.078, 'GPU1': 68.693, 'GPU2': 66.898, 'GPU3': 67.236}
2023-08-22 22:40:03.848818
{'GPU0': 66.177, 'GPU1': 68.675, 'GPU2': 67.094, 'GPU3': 66.926}
^C
Total time (s): 4.421529293060303
Total energy (J):
{'GPU0': 198.52566362297537, 'GPU1': 206.22215216255188, 'GPU2': 201.08565518283845, 'GPU3': 201.79834523367884}
Hardware Support
We currently support both NVIDIA (via NVML) and AMD GPUs (via AMDSMI, with ROCm 6.1 or later).
get_gpus
The get_gpus
function returns a GPUs
object, which can be either an NVIDIAGPUs
or AMDGPUs
object depending on the availability of nvml
or amdsmi
. Each GPUs
object contains one or more GPU
instances, which are specifically NVIDIAGPU
or AMDGPU
objects.
These GPU
objects directly call respective nvml
or amdsmi
methods, providing a one-to-one mapping of methods for seamless GPU abstraction and support for multiple GPU types. For example:
- NVIDIAGPU.getName
calls pynvml.nvmlDeviceGetName
.
- AMDGPU.getName
calls amdsmi.amdsmi_get_gpu_asic_info
.
Notes on AMD GPUs
AMD GPUs Initialization
amdsmi.amdsmi_get_energy_count
sometimes returns invalid values on certain GPUs or ROCm versions (e.g., MI100 on ROCm 6.2). See ROCm issue #38 for more details. During the AMDGPUs
object initialization, we call amdsmi.amdsmi_get_energy_count
twice for each GPU, with a 0.5-second delay between calls. This difference is compared to power measurements to determine if amdsmi.amdsmi_get_energy_count
is stable and reliable. Initialization takes 0.5 seconds regardless of the number of AMD GPUs.
amdsmi.amdsmi_get_power_info
provides "average_socket_power" and "current_socket_power" fields, but the "current_socket_power" field is sometimes not supported and returns "N/A." During the AMDGPUs
object initialization, this method is checked, and if "N/A" is returned, the AMDGPU.getInstantPowerUsage
method is disabled. Instead, AMDGPU.getAveragePowerUsage
needs to be used.
Supported AMD SMI Versions
Only ROCm >= 6.1 is supported, as the AMDSMI APIs for power and energy return wrong values. For more information, see ROCm issue #22. Ensure your amdsmi
and ROCm versions are up to date.