Skip to content

energy

zeus.monitor.energy

Measure the GPU time and energy consumption of a block of code.

Measurement dataclass

Measurement result of one window.

Attributes:

Name Type Description
time float

Time elapsed (in seconds) during the measurement window.

energy dict[int, float]

Maps GPU indices to the energy consumed (in Joules) during the measurement window. GPU indices are from the DL framework's perspective after applying CUDA_VISIBLE_DEVICES.

Source code in zeus/monitor/energy.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
@dataclass
class Measurement:
    """Measurement result of one window.

    Attributes:
        time: Time elapsed (in seconds) during the measurement window.
        energy: Maps GPU indices to the energy consumed (in Joules) during the
            measurement window. GPU indices are from the DL framework's perspective
            after applying `CUDA_VISIBLE_DEVICES`.
    """

    time: float
    energy: dict[int, float]

    @cached_property
    def total_energy(self) -> float:
        """Total energy consumed (in Joules) during the measurement window."""
        return sum(self.energy.values())

total_energy cached property

total_energy

Total energy consumed (in Joules) during the measurement window.

ZeusMonitor

Measure the GPU energy and time consumption of a block of code.

Works for multi-GPU and heterogeneous GPU types. Aware of CUDA_VISIBLE_DEVICES. For instance, if CUDA_VISIBLE_DEVICES=2,3, GPU index 1 passed into gpu_indices will be interpreted as CUDA device 3.

You can mark the beginning and end of a measurement window, during which the GPU energy and time consumed will be recorded. Multiple concurrent measurement windows are supported.

For Volta or newer GPUs, energy consumption is measured very cheaply with the nvmlDeviceGetTotalEnergyConsumption API. On older architectures, this API is not supported, so a separate Python process is used to poll nvmlDeviceGetPowerUsage to get power samples over time, which are integrated to compute energy consumption.

Integration Example
from zeus.monitor import ZeusMontior

# Time/Energy measurements for four GPUs will begin and end at the same time.
gpu_indices = [0, 1, 2, 3]
monitor = ZeusMonitor(gpu_indices)

# Mark the beginning of a measurement window. You can use any string
# as the window name, but make sure it's unique.
monitor.begin_window("entire_training")

# Actual work
training(x, y)

# Mark the end of a measurement window and retrieve the measurment result.
result = monitor.end_window("entire_training")

# Print the measurement result.
print(f"Training took {result.time} seconds.")
print(f"Training consumed {result.total_energy} Joules.")
for gpu_idx, gpu_energy in result.energy.items():
    print(f"GPU {gpu_idx} consumed {gpu_energy} Joules.")

Attributes:

Name Type Description
gpu_indices `list[int]`

Indices of all the CUDA devices to monitor, from the DL framework's perspective after applying CUDA_VISIBLE_DEVICES.

nvml_gpu_indices `list[int]`

Indices of all the CUDA devices to monitor, from NVML/system's perspective.

Source code in zeus/monitor/energy.py
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
class ZeusMonitor:
    """Measure the GPU energy and time consumption of a block of code.

    Works for multi-GPU and heterogeneous GPU types. Aware of `CUDA_VISIBLE_DEVICES`.
    For instance, if `CUDA_VISIBLE_DEVICES=2,3`, GPU index `1` passed into `gpu_indices`
    will be interpreted as CUDA device `3`.

    You can mark the beginning and end of a measurement window, during which the GPU
    energy and time consumed will be recorded. Multiple concurrent measurement windows
    are supported.

    For Volta or newer GPUs, energy consumption is measured very cheaply with the
    `nvmlDeviceGetTotalEnergyConsumption` API. On older architectures, this API is
    not supported, so a separate Python process is used to poll `nvmlDeviceGetPowerUsage`
    to get power samples over time, which are integrated to compute energy consumption.

    ## Integration Example

    ```python
    from zeus.monitor import ZeusMontior

    # Time/Energy measurements for four GPUs will begin and end at the same time.
    gpu_indices = [0, 1, 2, 3]
    monitor = ZeusMonitor(gpu_indices)

    # Mark the beginning of a measurement window. You can use any string
    # as the window name, but make sure it's unique.
    monitor.begin_window("entire_training")

    # Actual work
    training(x, y)

    # Mark the end of a measurement window and retrieve the measurment result.
    result = monitor.end_window("entire_training")

    # Print the measurement result.
    print(f"Training took {result.time} seconds.")
    print(f"Training consumed {result.total_energy} Joules.")
    for gpu_idx, gpu_energy in result.energy.items():
        print(f"GPU {gpu_idx} consumed {gpu_energy} Joules.")
    ```

    Attributes:
        gpu_indices (`list[int]`): Indices of all the CUDA devices to monitor, from the
            DL framework's perspective after applying `CUDA_VISIBLE_DEVICES`.
        nvml_gpu_indices (`list[int]`): Indices of all the CUDA devices to monitor, from
            NVML/system's perspective.
    """

    def __init__(
        self,
        gpu_indices: list[int] | None = None,
        approx_instant_energy: bool = False,
        log_file: str | Path | None = None,
    ) -> None:
        """Instantiate the monitor.

        Args:
            gpu_indices: Indices of all the CUDA devices to monitor. Time/Energy measurements
                will begin and end at the same time for these GPUs (i.e., synchronized).
                If None, all the GPUs available will be used. `CUDA_VISIBLE_DEVICES`
                is respected if set, e.g., GPU index `1` passed into `gpu_indices` when
                `CUDA_VISIBLE_DEVICES=2,3` will be interpreted as CUDA device `3`.
                `CUDA_VISIBLE_DEVICES`s formatted with comma-separated indices are supported.
            approx_instant_energy: When the execution time of a measurement window is
                shorter than the NVML energy counter's update period, energy consumption may
                be observed as zero. In this case, if `approx_instant_energy` is True, the
                window's energy consumption will be approximated by multiplying the current
                instantaneous power consumption with the window's execution time. This should
                be a better estimate than zero, but it's still an approximation.
            log_file: Path to the log CSV file. If `None`, logging will be disabled.
        """
        # Save arguments.
        self.approx_instant_energy = approx_instant_energy

        # Initialize NVML.
        pynvml.nvmlInit()
        atexit.register(pynvml.nvmlShutdown)

        # CUDA GPU indices and NVML GPU indices are different if `CUDA_VISIBLE_DEVICES` is set.
        self.gpu_indices, self.nvml_gpu_indices = resolve_gpu_indices(gpu_indices)

        # Save all the NVML GPU handles. These should be called with system-level GPU indices.
        self.gpu_handles: dict[int, pynvml.c_nvmlDevice_t] = {}
        for nvml_gpu_index, gpu_index in zip(self.nvml_gpu_indices, self.gpu_indices):
            handle = pynvml.nvmlDeviceGetHandleByIndex(nvml_gpu_index)
            self.gpu_handles[gpu_index] = handle

        # Initialize loggers.
        if log_file is None:
            self.log_file = None
        else:
            if dir := os.path.dirname(log_file):
                os.makedirs(dir, exist_ok=True)
            self.log_file = open(log_file, "w")
            logger.info("Writing measurement logs to %s.", log_file)
            self.log_file.write(
                f"start_time,window_name,elapsed_time,{','.join(map(lambda i: f'gpu{i}_energy', self.gpu_indices))}\n",
            )
            self.log_file.flush()

        logger.info("Monitoring GPU indices %s.", self.gpu_indices)

        # A dictionary that maps the string keys of active measurement windows to
        # the state of the measurement window. Each element in the dictionary is a tuple of:
        #     1) Time elapsed at the beginning of this window.
        #     2) Total energy consumed by each >= Volta GPU at the beginning of
        #        this window (`None` for older GPUs).
        self.measurement_states: dict[str, tuple[float, dict[int, float]]] = {}

        # Initialize power monitors for older architecture GPUs.
        old_gpu_indices = [
            gpu_index
            for gpu_index, is_new in zip(self.gpu_indices, self._is_new_arch_flags)
            if not is_new
        ]
        if old_gpu_indices:
            self.power_monitor = PowerMonitor(
                gpu_indices=old_gpu_indices, update_period=None
            )
        else:
            self.power_monitor = None

    @lru_cache
    def _is_new_arch(self, gpu: int) -> bool:
        """Return whether the GPU is Volta or newer."""
        return (
            pynvml.nvmlDeviceGetArchitecture(self.gpu_handles[gpu])
            >= pynvml.NVML_DEVICE_ARCH_VOLTA
        )

    @cached_property
    def _is_new_arch_flags(self) -> list[bool]:
        """A list of flags indicating whether each GPU is Volta or newer."""
        return [self._is_new_arch(gpu) for gpu in self.gpu_handles]

    def _get_instant_power(self) -> tuple[dict[int, float], float]:
        """Measure the power consumption of all GPUs at the current time."""
        power_measurement_start_time: float = time()
        power = {
            i: pynvml.nvmlDeviceGetPowerUsage(h) / 1000.0
            for i, h in self.gpu_handles.items()
        }
        power_measurement_time = time() - power_measurement_start_time
        return power, power_measurement_time

    def begin_window(self, key: str, sync_cuda: bool = True) -> None:
        """Begin a new measurement window.

        Args:
            key: Unique name of the measurement window.
            sync_cuda: Whether to synchronize CUDA before starting the measurement window.
                (Default: `True`)
        """
        # Make sure the key is unique.
        if key in self.measurement_states:
            raise ValueError(f"Measurement window '{key}' already exists")

        # Call cudaSynchronize to make sure we freeze at the right time.
        if sync_cuda:
            for gpu_index in self.gpu_handles:
                cuda_sync(gpu_index)

        # Freeze the start time of the profiling window.
        timestamp: float = time()
        energy_state: dict[int, float] = {}
        for gpu_index, gpu_handle in self.gpu_handles.items():
            # Query energy directly if the GPU has newer architecture.
            # Otherwise, the Zeus power monitor is running in the background to
            # collect power consumption, so we just need to read the log file later.
            if self._is_new_arch(gpu_index):
                energy_state[gpu_index] = (
                    pynvml.nvmlDeviceGetTotalEnergyConsumption(gpu_handle) / 1000.0
                )

        # Add measurement state to dictionary.
        self.measurement_states[key] = (timestamp, energy_state)
        logger.debug("Measurement window '%s' started.", key)

    def end_window(
        self, key: str, sync_cuda: bool = True, cancel: bool = False
    ) -> Measurement:
        """End a measurement window and return the time and energy consumption.

        Args:
            key: Name of an active measurement window.
            sync_cuda: Whether to synchronize CUDA before ending the measurement window.
                (default: `True`)
            cancel: Whether to cancel the measurement window. If `True`, the measurement
                window is assumed to be cancelled and discarded. Thus, an empty Measurement
                object will be returned and the measurement window will not be recorded in
                the log file either. `sync_cuda` is still respected.
        """
        # Retrieve the start time and energy consumption of this window.
        try:
            start_time, start_energy = self.measurement_states.pop(key)
        except KeyError:
            raise ValueError(f"Measurement window '{key}' does not exist") from None

        # Take instant power consumption measurements.
        # This, in theory, is introducing extra NVMLs call in the critical path
        # even if computation time is not so short. However, it is reasonable to
        # expect that computation time would be short if the user explicitly
        # turned on the `approx_instant_energy` option. Calling this function
        # as early as possible will lead to more accurate energy approximation.
        power, power_measurement_time = (
            self._get_instant_power() if self.approx_instant_energy else ({}, 0.0)
        )

        # Call cudaSynchronize to make sure we freeze at the right time.
        if sync_cuda:
            for gpu_index in self.gpu_handles:
                cuda_sync(gpu_index)

        # If the measurement window is cancelled, return an empty Measurement object.
        if cancel:
            logger.debug("Measurement window '%s' cancelled.", key)
            return Measurement(time=0.0, energy={gpu: 0.0 for gpu in self.gpu_handles})

        end_time: float = time()
        time_consumption: float = end_time - start_time
        energy_consumption: dict[int, float] = {}
        for gpu_index, gpu_handle in self.gpu_handles.items():
            # Query energy directly if the GPU has newer architecture.
            if self._is_new_arch(gpu_index):
                end_energy = (
                    pynvml.nvmlDeviceGetTotalEnergyConsumption(gpu_handle) / 1000.0
                )
                energy_consumption[gpu_index] = end_energy - start_energy[gpu_index]

        # If there are older GPU architectures, the PowerMonitor will take care of those.
        if self.power_monitor is not None:
            energy = self.power_monitor.get_energy(start_time, end_time)
            # Fallback to the instant power measurement if the PowerMonitor does not
            # have the power samples.
            if energy is None:
                energy = {gpu: 0.0 for gpu in self.power_monitor.gpu_indices}
            energy_consumption |= energy

        # Approximate energy consumption if the measurement window is too short.
        if self.approx_instant_energy:
            for gpu_index in self.gpu_indices:
                if energy_consumption[gpu_index] == 0.0:
                    energy_consumption[gpu_index] = power[gpu_index] * (
                        time_consumption - power_measurement_time
                    )

        logger.debug("Measurement window '%s' ended.", key)

        # Add to log file.
        if self.log_file is not None:
            self.log_file.write(
                f"{start_time},{key},{time_consumption},"
                + ",".join(str(energy_consumption[gpu]) for gpu in self.gpu_indices)
                + "\n"
            )
            self.log_file.flush()

        return Measurement(time_consumption, energy_consumption)

_is_new_arch_flags cached property

_is_new_arch_flags

A list of flags indicating whether each GPU is Volta or newer.

__init__

__init__(gpu_indices=None, approx_instant_energy=False, log_file=None)

Parameters:

Name Type Description Default
gpu_indices list[int] | None

Indices of all the CUDA devices to monitor. Time/Energy measurements will begin and end at the same time for these GPUs (i.e., synchronized). If None, all the GPUs available will be used. CUDA_VISIBLE_DEVICES is respected if set, e.g., GPU index 1 passed into gpu_indices when CUDA_VISIBLE_DEVICES=2,3 will be interpreted as CUDA device 3. CUDA_VISIBLE_DEVICESs formatted with comma-separated indices are supported.

None
approx_instant_energy bool

When the execution time of a measurement window is shorter than the NVML energy counter's update period, energy consumption may be observed as zero. In this case, if approx_instant_energy is True, the window's energy consumption will be approximated by multiplying the current instantaneous power consumption with the window's execution time. This should be a better estimate than zero, but it's still an approximation.

False
log_file str | Path | None

Path to the log CSV file. If None, logging will be disabled.

None
Source code in zeus/monitor/energy.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
def __init__(
    self,
    gpu_indices: list[int] | None = None,
    approx_instant_energy: bool = False,
    log_file: str | Path | None = None,
) -> None:
    """Instantiate the monitor.

    Args:
        gpu_indices: Indices of all the CUDA devices to monitor. Time/Energy measurements
            will begin and end at the same time for these GPUs (i.e., synchronized).
            If None, all the GPUs available will be used. `CUDA_VISIBLE_DEVICES`
            is respected if set, e.g., GPU index `1` passed into `gpu_indices` when
            `CUDA_VISIBLE_DEVICES=2,3` will be interpreted as CUDA device `3`.
            `CUDA_VISIBLE_DEVICES`s formatted with comma-separated indices are supported.
        approx_instant_energy: When the execution time of a measurement window is
            shorter than the NVML energy counter's update period, energy consumption may
            be observed as zero. In this case, if `approx_instant_energy` is True, the
            window's energy consumption will be approximated by multiplying the current
            instantaneous power consumption with the window's execution time. This should
            be a better estimate than zero, but it's still an approximation.
        log_file: Path to the log CSV file. If `None`, logging will be disabled.
    """
    # Save arguments.
    self.approx_instant_energy = approx_instant_energy

    # Initialize NVML.
    pynvml.nvmlInit()
    atexit.register(pynvml.nvmlShutdown)

    # CUDA GPU indices and NVML GPU indices are different if `CUDA_VISIBLE_DEVICES` is set.
    self.gpu_indices, self.nvml_gpu_indices = resolve_gpu_indices(gpu_indices)

    # Save all the NVML GPU handles. These should be called with system-level GPU indices.
    self.gpu_handles: dict[int, pynvml.c_nvmlDevice_t] = {}
    for nvml_gpu_index, gpu_index in zip(self.nvml_gpu_indices, self.gpu_indices):
        handle = pynvml.nvmlDeviceGetHandleByIndex(nvml_gpu_index)
        self.gpu_handles[gpu_index] = handle

    # Initialize loggers.
    if log_file is None:
        self.log_file = None
    else:
        if dir := os.path.dirname(log_file):
            os.makedirs(dir, exist_ok=True)
        self.log_file = open(log_file, "w")
        logger.info("Writing measurement logs to %s.", log_file)
        self.log_file.write(
            f"start_time,window_name,elapsed_time,{','.join(map(lambda i: f'gpu{i}_energy', self.gpu_indices))}\n",
        )
        self.log_file.flush()

    logger.info("Monitoring GPU indices %s.", self.gpu_indices)

    # A dictionary that maps the string keys of active measurement windows to
    # the state of the measurement window. Each element in the dictionary is a tuple of:
    #     1) Time elapsed at the beginning of this window.
    #     2) Total energy consumed by each >= Volta GPU at the beginning of
    #        this window (`None` for older GPUs).
    self.measurement_states: dict[str, tuple[float, dict[int, float]]] = {}

    # Initialize power monitors for older architecture GPUs.
    old_gpu_indices = [
        gpu_index
        for gpu_index, is_new in zip(self.gpu_indices, self._is_new_arch_flags)
        if not is_new
    ]
    if old_gpu_indices:
        self.power_monitor = PowerMonitor(
            gpu_indices=old_gpu_indices, update_period=None
        )
    else:
        self.power_monitor = None

_is_new_arch cached

_is_new_arch(gpu)

Return whether the GPU is Volta or newer.

Source code in zeus/monitor/energy.py
179
180
181
182
183
184
185
@lru_cache
def _is_new_arch(self, gpu: int) -> bool:
    """Return whether the GPU is Volta or newer."""
    return (
        pynvml.nvmlDeviceGetArchitecture(self.gpu_handles[gpu])
        >= pynvml.NVML_DEVICE_ARCH_VOLTA
    )

_get_instant_power

_get_instant_power()

Measure the power consumption of all GPUs at the current time.

Source code in zeus/monitor/energy.py
192
193
194
195
196
197
198
199
200
def _get_instant_power(self) -> tuple[dict[int, float], float]:
    """Measure the power consumption of all GPUs at the current time."""
    power_measurement_start_time: float = time()
    power = {
        i: pynvml.nvmlDeviceGetPowerUsage(h) / 1000.0
        for i, h in self.gpu_handles.items()
    }
    power_measurement_time = time() - power_measurement_start_time
    return power, power_measurement_time

begin_window

begin_window(key, sync_cuda=True)

Begin a new measurement window.

Parameters:

Name Type Description Default
key str

Unique name of the measurement window.

required
sync_cuda bool

Whether to synchronize CUDA before starting the measurement window. (Default: True)

True
Source code in zeus/monitor/energy.py
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
def begin_window(self, key: str, sync_cuda: bool = True) -> None:
    """Begin a new measurement window.

    Args:
        key: Unique name of the measurement window.
        sync_cuda: Whether to synchronize CUDA before starting the measurement window.
            (Default: `True`)
    """
    # Make sure the key is unique.
    if key in self.measurement_states:
        raise ValueError(f"Measurement window '{key}' already exists")

    # Call cudaSynchronize to make sure we freeze at the right time.
    if sync_cuda:
        for gpu_index in self.gpu_handles:
            cuda_sync(gpu_index)

    # Freeze the start time of the profiling window.
    timestamp: float = time()
    energy_state: dict[int, float] = {}
    for gpu_index, gpu_handle in self.gpu_handles.items():
        # Query energy directly if the GPU has newer architecture.
        # Otherwise, the Zeus power monitor is running in the background to
        # collect power consumption, so we just need to read the log file later.
        if self._is_new_arch(gpu_index):
            energy_state[gpu_index] = (
                pynvml.nvmlDeviceGetTotalEnergyConsumption(gpu_handle) / 1000.0
            )

    # Add measurement state to dictionary.
    self.measurement_states[key] = (timestamp, energy_state)
    logger.debug("Measurement window '%s' started.", key)

end_window

end_window(key, sync_cuda=True, cancel=False)

End a measurement window and return the time and energy consumption.

Parameters:

Name Type Description Default
key str

Name of an active measurement window.

required
sync_cuda bool

Whether to synchronize CUDA before ending the measurement window. (default: True)

True
cancel bool

Whether to cancel the measurement window. If True, the measurement window is assumed to be cancelled and discarded. Thus, an empty Measurement object will be returned and the measurement window will not be recorded in the log file either. sync_cuda is still respected.

False
Source code in zeus/monitor/energy.py
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
def end_window(
    self, key: str, sync_cuda: bool = True, cancel: bool = False
) -> Measurement:
    """End a measurement window and return the time and energy consumption.

    Args:
        key: Name of an active measurement window.
        sync_cuda: Whether to synchronize CUDA before ending the measurement window.
            (default: `True`)
        cancel: Whether to cancel the measurement window. If `True`, the measurement
            window is assumed to be cancelled and discarded. Thus, an empty Measurement
            object will be returned and the measurement window will not be recorded in
            the log file either. `sync_cuda` is still respected.
    """
    # Retrieve the start time and energy consumption of this window.
    try:
        start_time, start_energy = self.measurement_states.pop(key)
    except KeyError:
        raise ValueError(f"Measurement window '{key}' does not exist") from None

    # Take instant power consumption measurements.
    # This, in theory, is introducing extra NVMLs call in the critical path
    # even if computation time is not so short. However, it is reasonable to
    # expect that computation time would be short if the user explicitly
    # turned on the `approx_instant_energy` option. Calling this function
    # as early as possible will lead to more accurate energy approximation.
    power, power_measurement_time = (
        self._get_instant_power() if self.approx_instant_energy else ({}, 0.0)
    )

    # Call cudaSynchronize to make sure we freeze at the right time.
    if sync_cuda:
        for gpu_index in self.gpu_handles:
            cuda_sync(gpu_index)

    # If the measurement window is cancelled, return an empty Measurement object.
    if cancel:
        logger.debug("Measurement window '%s' cancelled.", key)
        return Measurement(time=0.0, energy={gpu: 0.0 for gpu in self.gpu_handles})

    end_time: float = time()
    time_consumption: float = end_time - start_time
    energy_consumption: dict[int, float] = {}
    for gpu_index, gpu_handle in self.gpu_handles.items():
        # Query energy directly if the GPU has newer architecture.
        if self._is_new_arch(gpu_index):
            end_energy = (
                pynvml.nvmlDeviceGetTotalEnergyConsumption(gpu_handle) / 1000.0
            )
            energy_consumption[gpu_index] = end_energy - start_energy[gpu_index]

    # If there are older GPU architectures, the PowerMonitor will take care of those.
    if self.power_monitor is not None:
        energy = self.power_monitor.get_energy(start_time, end_time)
        # Fallback to the instant power measurement if the PowerMonitor does not
        # have the power samples.
        if energy is None:
            energy = {gpu: 0.0 for gpu in self.power_monitor.gpu_indices}
        energy_consumption |= energy

    # Approximate energy consumption if the measurement window is too short.
    if self.approx_instant_energy:
        for gpu_index in self.gpu_indices:
            if energy_consumption[gpu_index] == 0.0:
                energy_consumption[gpu_index] = power[gpu_index] * (
                    time_consumption - power_measurement_time
                )

    logger.debug("Measurement window '%s' ended.", key)

    # Add to log file.
    if self.log_file is not None:
        self.log_file.write(
            f"{start_time},{key},{time_consumption},"
            + ",".join(str(energy_consumption[gpu]) for gpu in self.gpu_indices)
            + "\n"
        )
        self.log_file.flush()

    return Measurement(time_consumption, energy_consumption)