notesum.ai
Published at December 3Selective Reviews of Bandit Problems in AI via a Statistical View
stat.ML
cs.AI
cs.LG
econ.EM
math.PR
Released Date: December 3, 2024
Authors: Pengjie Zhou1, Haoyu Wei2, Huiming Zhang1
Aff.: 1Institute of Artificial Intelligence, Beihang University, Beijing, China; 2Department of Economics, University of California, San Diego

| Aspect | GP-UCB | GP-TS |
|---|---|---|
| Acquisition Function Definition | Combines the posterior mean and variance with a confidence parameter to form an upper confidence bound. | Directly uses a function sampled from the GP posterior as the acquisition function. |
| Exploration vs. Exploitation | Explicitly balances exploration and exploitation through the parameter , which scales the influence of the uncertainty term . | Implicitly balances exploration and exploitation through the randomness of the sampled functions, capturing the posterior uncertainty. |
| Parameter Tuning | Requires careful selection of to ensure optimal performance and convergence guarantees. | Generally requires less parameter tuning since the balance is managed through posterior sampling. |
| Computational Considerations | May involve optimization over the acquisition function that includes both mean and variance terms. | Optimization is performed over a single sampled function, which can be computationally efficient but may require multiple samples to stabilize performance. |