metric
zeus.util.metric
Defines the energy-time cost metric function.
ZeusCostThresholdExceededError
Bases: Exception
Raised when the predicted cost of the next epoch exceeds the cost threshold.
This exception is used for terminating all the processes when doing data
parallel training with multiple processes, because ONLY the master
process will predict next_cost
and do the threshold checking. However,
once the predicted cost exceeds the threshold, we want to terminate ALL
the processes. Currently this is achieved by throwing an exception at the
master process. The lauching script will terminate all the processes that
are still alive.
Attributes:
Name | Type | Description |
---|---|---|
time_consumed |
float
|
Time consumed until the current epoch. |
energy_consumed |
float
|
Energy consumed until the current epoch. |
cost |
float
|
Computed Zeus's energy-time cost metric until the current epoch. |
next_cost |
float
|
Predicted Zeus's energy-time cost metric after next epoch. |
cost_thresh |
float
|
The cost threshold. |
Source code in zeus/util/metric.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
__init__
__init__(time_consumed, energy_consumed, cost, next_cost, cost_thresh)
Source code in zeus/util/metric.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
zeus_cost
zeus_cost(energy, time, eta_knob, max_power)
Compute Zeus's energy-time cost metric.
Trades off ETA and TTA based on the value of eta_knob
.
The caller is expected to do bound checking for eta_knob
,
because eta_knob
does not change frequently.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
energy |
float
|
Joules |
required |
time |
float
|
seconds |
required |
eta_knob |
float
|
Real number in [0, 1]. |
required |
max_power |
int
|
The maximum power limit of the GPU. |
required |
Returns:
Type | Description |
---|---|
float
|
The cost of the DL training job. |
Source code in zeus/util/metric.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|