notesum.ai
Published at November 6Self-Consistency Preference Optimization
cs.CL
cs.AI
cs.LG
Released Date: November 6, 2024
Authors: Archiki Prasad1, Weizhe Yuan1, Richard Yuanzhe Pang1, Jing Xu1, Maryam Fazel-Zarandi1, Mohit Bansal2, Sainbayar Sukhbaatar1, Jason Weston1, Jane Yu1
Aff.: 1Meta FAIR; 2UNC Chapel Hill

| Method | Train Data (K) | Puzzle Acc. (%) | Cell Acc. | |||
|---|---|---|---|---|---|---|
| # Seed / | Gen. | Overall | Easy | Hard | (%) | |
| Llama-3 Instruct 70B | - / | - | 17.2 | 52.1 | 3.6 | 42.9 |
| Gemma-2 27B IT∗ | - / | - | 16.3 | 50.7 | 2.9 | 41.2 |
| Claude-3 Haiku∗ | - / | - | 14.3 | 47.9 | 1.2 | 37.9 |
| (Llama-3 Instruct 8B) | - / | - | 11.6 | 40.0 | 0.4 | 39.1 |
| w/ IRPORM | 1.0 / | - | 11.3 | 37.9 | 1.0 | 42.1 |
| w/ ScPOUnsup. | 0.4 / | 1.0 | 17.0 | 54.3 | 2.5 | 47.6 |
| w/ ScPOUnsup. | 0.4 / | 2.2 | 18.1 | 58.2 | 2.5 | 45.2 |