notesum.ai
Published at October 22Navigating Noisy Feedback: Enhancing Reinforcement Learning with Error-Prone Language Models
cs.CL
cs.AI
Released Date: October 22, 2024
Authors: Muhan Lin1, Shuyang Shi1, Yue Guo1, Behdad Chalaki2, Vaishnav Tadiparthi2, Ehsan Moradi Pari2, Simon Stepputtis1, Joseph Campbell1, Katia Sycara1
Aff.: 1Carnegie Mellon University; 2Honda Research Institute USA

| Env. | Mthd. | GT | Llama-3 8B | Mixtral | Llama-3 70B | GPT-4 |
|---|---|---|---|---|---|---|
| No Lock | Rank | 1.0 | 0.69 | 0.76 | 0.93 | 1.0 |
| Score | 1.0 | 0.77 | 0.89 | 0.98 | 1.0 | |
| Lock | Rank | 1.0 | 0.54 | 0.65 | 0.89 | 0.98 |
| Score | 1.0 | 0.55 | 0.74 | 0.97 | 0.98 | |
| Multi Lock | Rank | 1.0 | 0.58 | 0.60 | 0.90 | 0.99 |
| Score | 1.0 | 0.66 | 0.66 | 0.96 | 0.99 |