notesum.ai
Published at November 15Large Language Models as User-Agents for Evaluating Task-Oriented-Dialogue Systems
cs.CL
cs.AI
Released Date: November 15, 2024
Authors: Taaha Kazi1, Ruiliang Lyu1, Sizhe Zhou1, Dilek Hakkani-Tur1, Gokhan Tur1
Aff.: 1University of Illinois at Urbana Champaign
| Metric | Value |
|---|---|
| Cohen’s Kappa | 0.796 |
| Accuracy | 90.0% |
| F1 Score | 0.912 |
| Matthews Correlation Coefficient | 0.797 |