notesum.ai

Published at November 8

Improving Multi-Domain Task-Oriented Dialogue System with Offline Reinforcement Learning

cs.CL

cs.AI

cs.HC

cs.IR

Released Date: November 8, 2024

Authors: Dharmendra Prajapat¹, Durga Toshniwal

Aff.: ¹Dept of Computer Science and Engineering, Indian Institute of Technology Roorkee, India

Arxiv: http://arxiv.org/abs/2411.05340v1

Refer to caption

Model	Pre-trained Model	MultiWOZ2.1
Model	Pre-trained Model	Inform Rate	Success Rate	BLEU	Combined Score
LABES [24]	-	74.50	63.90	16.00	85.20
SimpleTOD [12]	GPT2	84.40	70.10	15.01	92.26
DoTS [25]	BERT-base	86.65	74.18	15.90	96.31
MANTOD+ [26]	-	84.00	74.8	18.80	98.20
PPTOD [27]	T5-base	87.09	79.08	19.17	102.26
MTTOD [16]	T5-base	91.00	82.10	21.00	107.50
UBAR* [14]	distil-GPT2	93.70	82.00	17.64	105.49
Ours (RL)	distil-GPT2	95.20	84.60	16.80	106.70