notesum.ai
Published at November 3Sample-Efficient Alignment for LLMs
cs.LG
cs.AI
cs.CL
Released Date: November 3, 2024
Authors: Zichen Liu1, Changyu Chen2, Chao Du3, Wee Sun Lee4, Min Lin3
Aff.: 1Sea AI Lab, National University of Singapore; 2Sea AI Lab, Singapore Management University; 3Sea AI Lab; 4National University of Singapore

| Variant | Inference (Test) | Exploration | Learn | Remark |
|---|---|---|---|---|
| 1 | passive | Online DAP (Guo et al., 2024) | ||
| 2 | active | SEA without ERM sync (Section 4.2.3) | ||
| 3 | active | SEA | ||
| 4 | passive | - | ||
| 5 | active | - | ||
| 6 | active | SEA with Best-of-N sampling | ||
| 7 | active | Not learn policy (Dwaracherla et al., 2024) |