common
zeus.optimizer.batch_size.common
Shared model definitions for the server and client.
JobParams
Bases: BaseModel
Job parameters.
Attributes:
Name | Type | Description |
---|---|---|
job_id |
str
|
unique ID for the job |
batch_sizes |
list[int]
|
list of batch sizes to try |
default_batch_size |
int
|
first batch size to try |
eta_knob |
float
|
\(\eta\) parameter for computing |
beta_knob |
Optional[float]
|
beta for early stopping. If min_cost*beta_knob < current_cost, job will be stopped by bso server. To disable, set it to None. |
target_metric |
float
|
target metric to achieve for training. |
higher_is_better_metric |
bool
|
if the goal of training is achieving higher metric than |
max_epochs |
int
|
Maximum number of epochs for a training run. |
num_pruning_rounds |
int
|
Number of rounds we are trying for pruning stage |
window_size |
int
|
For MAB, how many recent measurements to fetch for computing the arm states. If set to 0, fetch all measurements. |
mab_prior_mean |
float
|
Mean of the belief prior distribution. |
mab_prior_precision |
float
|
Precision of the belief prior distribution. |
mab_num_explorations |
int
|
How many static explorations to run when no observations are available. |
mab_seed |
Optional[int]
|
The random seed to use. |
Source code in zeus/optimizer/batch_size/common.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
GpuConfig
Bases: BaseModel
Gpu configuration of current training.
Source code in zeus/optimizer/batch_size/common.py
87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
JobSpec
Bases: JobParams
Job specification that user inputs.
Attributes:
Name | Type | Description |
---|---|---|
job_id |
Optional[str]
|
ID of job. If none is provided, will be created by server. |
Refer to JobParams
for other attributes.
Source code in zeus/optimizer/batch_size/common.py
102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
JobSpecFromClient
CreatedJob
TrialId
Bases: BaseModel
Response format from the server for getting a batch size to use, which is an unique idnetifier of trial.
Attributes:
Name | Type | Description |
---|---|---|
job_id |
str
|
ID of job |
batch_size |
int
|
batch size to use. |
trial_number |
int
|
trial number of current training. |
Source code in zeus/optimizer/batch_size/common.py
131 132 133 134 135 136 137 138 139 140 141 142 |
|
TrainingResult
Bases: TrialId
Result of training for that job & batch size.
Attributes:
Name | Type | Description |
---|---|---|
time |
float
|
total time consumption so far |
energy |
float
|
total energy consumption so far |
metric |
float
|
current metric value after |
current_epoch |
int
|
current epoch of training. Server can check if the train reached the |
Source code in zeus/optimizer/batch_size/common.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
ReportResponse
Bases: BaseModel
Response format from the server for client's training result report.
Attributes:
Name | Type | Description |
---|---|---|
stop_train |
bool
|
Whether we should stop training or not. |
converged |
bool
|
Whether the target metric has been reached. |
message |
str
|
message from the server regarding training. ex) why train should be stopped. |
Source code in zeus/optimizer/batch_size/common.py
161 162 163 164 165 166 167 168 169 170 171 172 |
|