notesum.ai

Published at November 8

Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent

cs.CV

cs.AI

cs.RO

Released Date: November 8, 2024

Authors: Linfeng He¹, Yiming Sun¹, Sihao Wu², Jiaxu Liu², Xiaowei Huang²

Aff.: ¹School of Computer Science, University of Nottingham, United Kingdom; ²Department of Computer Science, University of Liverpool, United Kingdom

Arxiv: http://arxiv.org/abs/2411.05898v1

Experiment	Accuracy	ChatGPT	Match	Bleu_1	Bleu_2	Bleu_3	Bleu_4	ROUGE_l	CIDEr	Final_score
DriveLM-Agent	-	-	-	-	-	-	-	-	-	-
Our Method (Llama-Adapter)	0.0	65.55	18.59	0.041	0.0002	0.000034	0.000014	0.076	0.082	0.3057
Our Method (Yolos)	$\mathbf{0.2966}$	58.243	$\mathbf{21.1484}$	$\mathbf{0.1078}$	$\mathbf{0.0333}$	$\mathbf{0.0105}$	$\mathbf{0.199}$	0.0093	$\mathbf{0.2632}$	$\mathbf{0.3548}$