Skip to content

metric

zeus.utils.metric

Defines the energy-time cost metric function.

zeus_cost

zeus_cost(energy, time, eta_knob, max_power)

Compute Zeus's energy-time cost metric.

Trades off ETA and TTA based on the value of eta_knob. The caller is expected to do bound checking for eta_knob, because eta_knob does not change frequently.

Parameters:

Name Type Description Default
energy float

Joules

required
time float

seconds

required
eta_knob float

Real number in [0, 1].

required
max_power int | float

The maximum power limit of the GPU.

required

Returns:

Type Description
float

The cost of the DL training job.

Source code in zeus/utils/metric.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def zeus_cost(
    energy: float, time: float, eta_knob: float, max_power: int | float
) -> float:
    """Compute Zeus's energy-time cost metric.

    Trades off ETA and TTA based on the value of `eta_knob`.
    The caller is expected to do bound checking for `eta_knob`,
    because `eta_knob` does not change frequently.

    Args:
        energy: Joules
        time: seconds
        eta_knob: Real number in [0, 1].
        max_power: The maximum power limit of the GPU.

    Returns:
        The cost of the DL training job.
    """
    return eta_knob * energy + (1 - eta_knob) * max_power * time

energy

energy(logfile, start=None, end=None)

Compute the energy consumption from the Zeus monitor power log file.

start and end are in units of seconds, relative to the beginning of the time window captured by the log file. Only the time window between start and end will be considered when computing energy.

start and end can be negative, in which case the pointers wrap around and effectively the absolute value is subtracted from the end of the window.

Parameters:

Name Type Description Default
logfile Path | str

Path to the power log file produced by the Zeus monitor.

required
start float | None

Start time of the window to consider.

None
end float | None

End time of the window to consider.

None
Source code in zeus/utils/metric.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def energy(
    logfile: Path | str,
    start: float | None = None,
    end: float | None = None,
) -> float:
    """Compute the energy consumption from the Zeus monitor power log file.

    `start` and `end` are in units of seconds, relative to the beginning of
    the time window captured by the log file. Only the time window between
    `start` and `end` will be considered when computing energy.

    `start` and `end` can be negative, in which case the pointers wrap around
    and effectively the absolute value is subtracted from the end of the window.

    Args:
        logfile: Path to the power log file produced by the Zeus monitor.
        start: Start time of the window to consider.
        end: End time of the window to consider.
    """
    df = cast(pd.DataFrame, pd.read_csv(logfile, engine="python", skipfooter=1))
    df["Time"] = pd.to_datetime(df["Time"])
    start_timestamp = df.iloc[0]["Time"]
    end_timestamp = df.iloc[-1]["Time"]
    if start is not None:
        origin = start_timestamp if start >= 0.0 else end_timestamp
        df = df.loc[df["Time"] >= origin + timedelta(seconds=start)]
    if end is not None:
        origin = start_timestamp if end >= 0.0 else end_timestamp
        df = df.loc[df["Time"] <= origin + timedelta(seconds=end)]
    seconds = _get_seconds(df)
    watts = _get_watts(df)
    return auc(seconds, watts)

avg_power

avg_power(logfile, start=None, end=None)

Compute the average power consumption from the Zeus monitor power log file.

start and end are in units of seconds, relative to the beginning of the time window captured by the log file. Only the time window between start and end will be considered when computing average power.

start and end can be negative, in which case the pointers wrap around and effectively the absolute value is subtracted from the end of the window.

Parameters:

Name Type Description Default
logfile Path | str

Path to the power log file produced by the Zeus monitor.

required
start float | None

Start time of the window to consider.

None
end float | None

End time of the window to consider.

None

Raises:

Type Description
ValueError

From sklearn.metrics.auc, when the duration of the profiling window is too small.

Source code in zeus/utils/metric.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
def avg_power(
    logfile: Path | str,
    start: float | None = None,
    end: float | None = None,
) -> float:
    """Compute the average power consumption from the Zeus monitor power log file.

    `start` and `end` are in units of seconds, relative to the beginning of
    the time window captured by the log file. Only the time window between
    `start` and `end` will be considered when computing average power.

    `start` and `end` can be negative, in which case the pointers wrap around
    and effectively the absolute value is subtracted from the end of the window.

    Args:
        logfile: Path to the power log file produced by the Zeus monitor.
        start: Start time of the window to consider.
        end: End time of the window to consider.

    Raises:
        ValueError: From `sklearn.metrics.auc`, when the duration of the
            profiling window is too small.
    """
    df = cast(pd.DataFrame, pd.read_csv(logfile, engine="python", skipfooter=1))
    df["Time"] = pd.to_datetime(df["Time"])
    if start is not None:
        df = df.loc[df["Time"] >= df.iloc[0]["Time"] + timedelta(seconds=start)]
    if end is not None:
        df = df.loc[df["Time"] <= df.iloc[0]["Time"] + timedelta(seconds=end)]
    seconds = _get_seconds(df)
    watts = _get_watts(df)
    area = auc(seconds, watts)
    return area / (max(seconds) - min(seconds))