testing
zeus.utils.testing
Utilities for testing.
ReplayZeusMonitor
Bases: ZeusMonitor
A mock ZeusMonitor that replays windows recorded by a real monitor.
This class is for testing only. Based on a CSV log file that records the time
and energy measurements of ZeusMonitor
measurement windows, users can drop-in
replace ZeusMonitor
with this class to replay the measurement windows and
fast forward training and time/energy measurement.
The methods exposed is identical to or a superset of ZeusMonitor
, but behaves
differently. Instead of monitoring the GPU, it replays events from a log file.
The log file generated by ZeusMonitor
(log_file
) is guaranteed to be compatible
and will replay time and energy measurements just like how the real monitor
experienced them. Note that in the case of concurrent ongoing measurement windows,
the log rows file should record windows in the order of end_window
calls.
Attributes:
Name | Type | Description |
---|---|---|
gpu_indices |
`list[int]`
|
Indices of all the CUDA devices to monitor. |
Source code in zeus/utils/testing.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
__init__
__init__(
gpu_indices=None,
approx_instant_energy=False,
log_file=None,
ignore_sync_execution=False,
match_window_name=True,
)
The log file should be a CSV file with the following header (e.g. gpu_indices=[0, 2]):
start_time,window_name,elapsed_time,gpu0_energy,gpu2_energy
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gpu_indices |
list[int] | None
|
Indices of all the CUDA devices to monitor. This should be consistent
with the indices used in the log file. If |
None
|
approx_instant_energy |
bool
|
Whether to approximate the instant energy consumption. Not used. |
False
|
log_file |
str | Path | None
|
Path to the log CSV file to replay events from. |
None
|
ignore_sync_execution |
bool
|
Whether to ignore |
False
|
match_window_name |
bool
|
Whether to make sure window names match. (Default: |
True
|
Source code in zeus/utils/testing.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
begin_window
begin_window(key, sync_execution=True)
Begin a new window.
This method just pushes the key into a list of ongoing measurement windows, and just makes sure it's unique.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
Name of the measurement window. |
required |
sync_execution |
bool
|
Whether to synchronize CUDA before starting the measurement window.
(Default: |
True
|
Source code in zeus/utils/testing.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
end_window
end_window(key, sync_execution=True, cancel=False)
End an ongoing window.
This method pops the key from a list of ongoing measurement windows and
constructs a Measurement
object corresponding to the name of the window
from the log file. If the name of the window does not match the expected
one, a RuntimeError
is raised.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
Name of the measurement window. |
required |
sync_execution |
bool
|
Whether to synchronize CUDA before ending the measurement window.
(Default: |
True
|
cancel |
bool
|
Whether to cancel the measurement window. This will not consume a
line from the log file. (Default: |
False
|
Source code in zeus/utils/testing.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|