notesum.ai

Published at December 4

Preference-based Pure Exploration

stat.ML

cs.LG

Released Date: December 4, 2024

Authors: Apurv Shukla¹, Debabrota Basu²

Aff.: ¹Texas A&M University; ²Univ. Lille, Inria, CNRS

Arxiv: http://arxiv.org/pdf/2412.02988v1

Notation	Description
$\mathcal{C},\prec_{\mathcal{C}\setminus\{0\}}$	Given convex cone and induced partial order
$K,L$	number of arms and objectives
$\mathcal{P}^{\ast},\hat{\mathcal{P}}_{t}$	Ground truth Pareto set and estimated Pareto set
$\mathcal{Z}$	Space of all Pareto Frontiers on $[0,1]^{K}$
$M\in\mathbb{R}^{K\times L}$	matrix with mean reward of $K$ arms
$r_{k_{t}},M_{k_{t}},\eta_{t}$	observed reward, mean reward and noise
$c(t,\delta)$	Confidence ball at time $t$ with confidence $\delta$
$d_{\mathrm{H}}(X,Y)$	Hausdroff distance between sets $X$ and $Y$
$d_{\mathrm{P}}\left(P,\hat{P}\right)$	Distance metric between Pareto Fronts $P$ and $\hat{P}$
$w$	Allocation vector
$\Pi$	Family of policies
$\hat{M}_{\ell,k}{t},M_{\ell}{k}$	Estimated and true of mean rewards
$\text{ch}\left({S}\right)$	Convex hull of set $S$
$\Lambda_{\pi^{*}}\left(M\right)$	Set of alternating instances associated with $M$