mab
zeus._legacy.policy.mab
Multi-Armed Bandit implementations.
GaussianTS
Thompson Sampling policy for Gaussian bandits.
For each arm, the reward is modeled as a Gaussian distribution with known precision. The conjugate priors are also Gaussian distributions.
Source code in zeus/_legacy/policy/mab.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|
__init__
__init__(
arms,
reward_precision,
prior_mean=0.0,
prior_precision=0.0,
num_exploration=1,
seed=123456,
verbose=True,
)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arms |
list[int]
|
Bandit arm values to use. |
required |
reward_precision |
list[float] | float
|
Precision (inverse variance) of the reward distribution.
Pass in a list of |
required |
prior_mean |
float
|
Mean of the belief prior distribution. |
0.0
|
prior_precision |
float
|
Precision of the belief prior distribution. |
0.0
|
num_exploration |
int
|
How many static explorations to run when no observations are available. |
1
|
seed |
int
|
The random seed to use. |
123456
|
verbose |
bool
|
Whether to print out what's going on. |
True
|
Source code in zeus/_legacy/policy/mab.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
|
fit
fit(decisions, rewards, reset)
Fit the bandit on the given list of observations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decisions |
list[int] | ndarray
|
A list of arms chosen. |
required |
rewards |
list[float] | ndarray
|
A list of rewards that resulted from choosing the arms in |
required |
reset |
bool
|
Whether to reset all arms. |
required |
Source code in zeus/_legacy/policy/mab.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
fit_arm
fit_arm(arm, rewards, reset)
Update the parameter distribution for one arm.
Reference: https://en.wikipedia.org/wiki/Conjugate_prior
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arm |
int
|
Arm to fit. |
required |
rewards |
ndarray
|
Array of rewards observed by pulling that arm. |
required |
reset |
bool
|
Whether to reset the parameters of the arm before fitting. |
required |
Source code in zeus/_legacy/policy/mab.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
predict
predict()
Return the arm with the largest sampled expected reward.
Source code in zeus/_legacy/policy/mab.py
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
predict_expectations
predict_expectations()
Sample the expected reward for each arm.
Assumes that each arm has been explored at least once. Otherwise, a value will be sampled from the prior.
Returns:
Type | Description |
---|---|
dict[int, float]
|
A mapping from every arm to their sampled expected reward. |
Source code in zeus/_legacy/policy/mab.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
|