notesum.ai
Published at November 26"Stupid robot, I want to speak to a human!" User Frustration Detection in Task-Oriented Dialog Systems
cs.CL
Released Date: November 26, 2024
Authors: Mireia Hernandez Caralt1, Ivan Sekulić, Filip Carević, Nghia Khau1, Diana Nicoleta Popa1, Bruna Guedes1, Victor Guimarães1, Zeyu Yang1, Andre Manso1, Meghana Reddy1, Paolo Rosso1, Roland Mathis1
Aff.: 1Telepathy Labs GmbH, Zürich, Switzerland

| UF = 0 (not frustrated) | UF = 1 (frustrated) | ||||||
|---|---|---|---|---|---|---|---|
| P | R | P | R | Macro- | |||
| Sentiment Analysis Hartmann et al. (2023) | |||||||
| RoBERTa-Sent-FC | 0.73 | 0.11 | 0.18 | 0.34 | 0.92 | 0.50 | 0.34 |
| RoBERTa-Sent-LU | 0.77 | 0.72 | 0.75 | 0.51 | 0.58 | 0.54 | 0.65 |
| Emotion Detection | |||||||
| DistilBERT-EmoWoZ-FC Huang (2024) | 0.70 | 0.77 | 0.73 | 0.42 | 0.34 | 0.38 | 0.56 |
| DistilBERT-EmoWoZ-LU | 0.74 | 0.68 | 0.71 | 0.45 | 0.52 | 0.48 | 0.60 |
| DistilRoBERTa-Emo-FC Hartmann (2022) | 0.67 | 1.00 | 0.80 | 0.00 | 0.00 | 0.00 | 0.40 |
| DistilRoBERTa-Emo-LU | 0.68 | 1.00 | 0.81 | 0.88 | 0.04 | 0.07 | 0.44 |
| Dialog Breakdown Detection | |||||||
| DBD+LogReg Bodigutla et al. (2020) | 0.78 | 0.93 | 0.85 | 0.78 | 0.46 | 0.58 | 0.71 |
| Rule-Based Approach | |||||||
| Keyword Matching | 0.67 | 1.00 | 0.80 | 1.00 | 0.01 | 0.01 | 0.41 |
| LLM-based ICL Approach | |||||||
| GPT-4o-zero-shot | 0.98 | 0.83 | 0.90 | 0.74 | 0.96 | 0.83 | 0.86 |
| GPT-4o-two-shot | 0.85 | 0.97 | 0.91 | 0.92 | 0.66 | 0.77 | 0.84 |
| Llama-3.1-405B-zero-shot | 0.99 | 0.74 | 0.85 | 0.67 | 0.99 | 0.79 | 0.83 |
| Llama-3.1-405B-two-shot | 0.97 | 0.84 | 0.90 | 0.75 | 0.96 | 0.84 | 0.87 |