notesum.ai
Published at October 30Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
cs.LG
cs.AI
Released Date: October 30, 2024
Authors: Sheryl Hsu1, Omar Khattab2, Chelsea Finn1, Archit Sharma1
Aff.: 1Stanford University; 2Databricks

| Model | Method | 1 Hop | 2 Hops | 3 Hops | 4 Hops | Generator | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | RE | AP | RE | AP | RE | AP | RE | AP | EM | |
| Base | 42.3 | 38.8 | 54.7 | 41.9 | — | 41.0 | ||||
| Llama 8b | Few-shot | 49.9 | 45.6 | 64.8 | 53.9 | — | 47.1 | |||
| Few-shot all | 50.2 | 46.4 | 63.5 | 50.5 | — | 45.2 | ||||
| HotpotQA | LeReT-CD | 51.4 | 47.3 | 69.8 | 58.0 | — | 49.3 | |||
| LeReT | 56.7 | 52.5 | 77.1 | 66.3 | — | 52.5 | ||||
| Base | 52.2 | 48.4 | 70.9 | 57.7 | — | 51.0 | ||||
| Gemma 9b | Few-shot | 54.4 | 50.4 | 66.7 | 57.8 | — | 48.5 | |||
| Few-shot all | 54.6 | 50.5 | 69.6 | 58.8 | — | 50.0 | ||||
| HotpotQA | LeReT-CD | 53.5 | 49.6 | 71.9 | 59.2 | — | 51.4 | |||
| LeReT | 56.1 | 52.2 | 79.9 | 67.0 | — | 54.3 | ||||
| Base | 37.9 | 34.8 | 45.6 | 37.9 | 48.5 | 38.3 | 50.0 | 39.3 | 61.5 | |
| Llama 8b | Few-shot | 45.6 | 42.2 | 53.4 | 45.9 | 56.0 | 46.0 | 57.3 | 46.1 | 64.6 |
| Few-shot all | 38.8 | 35.8 | 51.9 | 44.4 | 57.5 | 46.3 | 59.7 | 45.9 | 64.7 | |
| HoVer | LeReT-CD | 42.9 | 39.8 | 56.6 | 48.4 | 63.2 | 52.2 | 66.9 | 54.3 | 67.5 |
| LeReT | 45.8 | 42.5 | 65.4 | 56.1 | 72.8 | 61.4 | 76.9 | 64.3 | 69.8 | |
| Base | 40.8 | 37.7 | 45.5 | 38.1 | 48.8 | 39.6 | 50.1 | 40.4 | 61.7 | |
| Gemma 9b | Few-shot | 46.3 | 42.9 | 55.4 | 46.8 | 57.9 | 48.4 | 59.3 | 48.5 | 64.3 |
| Few-shot all | 46.3 | 42.8 | 55.8 | 48.7 | 64.1 | 52.6 | 68.2 | 54.1 | 67.5 | |
| HoVer | LeReT-CD | 45.2 | 41.7 | 59.5 | 50.7 | 65.3 | 54.1 | 69.0 | 56.2 | 67.2 |
| LeReT | 47.0 | 43.7 | 67.5 | 57.6 | 75.2 | 63.1 | 79.4 | 66.1 | 71.5 | |