notesum.ai
Published at November 20LaVida Drive: Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement
cs.CV
cs.AI
Released Date: November 20, 2024
Authors: Siwen Jiao1, Yangyi Fang2
Aff.: 1National University of Singapore, Agency for Science, Technology and Research, Singapore; 2Tsinghua University
| Dataset | Method | Ref. | BLEU-4↑ | METEOR↑ | ROUGE-L↑ | CIDEr↑ |
| DriveLM Dataset | EM-VLM4ADBase | CVPR’24 | 45.4 | 34.5 | 72.0 | 3.20 |
| EM-VLM4ADLarge | CVPR’24 | 40.1 | 34.3 | 70.7 | 3.10 | |
| DriveLM-Agent | ECCV’24 | 53.1 | 36.2 | 66.8 | 2.79 | |
| LaVida Drive (Ours) | - | 51.3 | 38.0 | 73.9 | 3.32 |