notesum.ai
Published at November 8Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning
cs.CL
cs.AI
cs.HC
cs.IR
Released Date: November 8, 2024
Authors: Dharmendra Prajapat1, Durga Toshniwal
Aff.: 1Dept of Computer Science and Engineering, Indian Institute of Technology Roorkee, India

| Model | Pre-trained Model | MultiWOZ2.1 | |||
| Inform Rate | Success Rate | BLEU | Combined Score | ||
| LABES [24] | - | 74.50 | 63.90 | 16.00 | 85.20 |
| SimpleTOD [12] | GPT2 | 84.40 | 70.10 | 15.01 | 92.26 |
| DoTS [25] | BERT-base | 86.65 | 74.18 | 15.90 | 96.31 |
| MANTOD+ [26] | - | 84.00 | 74.8 | 18.80 | 98.20 |
| PPTOD [27] | T5-base | 87.09 | 79.08 | 19.17 | 102.26 |
| MTTOD [16] | T5-base | 91.00 | 82.10 | 21.00 | 107.50 |
| UBAR* [14] | distil-GPT2 | 93.70 | 82.00 | 17.64 | 105.49 |
| Ours (RL) | distil-GPT2 | 95.20 | 84.60 | 16.80 | 106.70 |