notesum.ai
Published at November 20MEGL: Multimodal Explanation-Guided Learning
cs.CV
cs.AI
Released Date: November 20, 2024
Authors: Yifei Zhang1, Tianxu Jiang, Bo Pan, Jingyu Wang1, Guangji Bai, Liang Zhao
Aff.: 1Emory University
| Backbone | Method | Object-ME | Action-ME | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1 Score | mIoU | Accuracy | Precision | Recall | F1 Score | mIoU | ||
| LLaVA [34] | - | 0.8220 | 0.5702 | 0.5550 | 0.5568 | - | 0.8595 | 0.6436 | 0.6124 | 0.6233 | - |
| LLaVA | Fine-Tune-CoT [23] | 0.8233 | 0.5448 | 0.6003 | 0.5643 | - | 0.8789 | 0.7052 | 0.6182 | 0.6566 | - |
| ResNet18 | - | 0.7265 | 0.5294 | 0.5432 | 0.5192 | 0.3391 | 0.7973 | 0.7807 | 0.8307 | 0.7952 | 0.3961 |
| CDEP [45] | 0.7119 | 0.5320 | 0.5925 | 0.5400 | 0.3703 | 0.7764 | 0.7514 | 0.8147 | 0.7651 | 0.4168 | |
| HAICS [48] | 0.7203 | 0.5208 | 0.5507 | 0.5181 | 0.3692 | 0.7649 | 0.7379 | 0.8046 | 0.7493 | 0.4142 | |
| RES-G [14] | 0.7171 | 0.5402 | 0.5944 | 0.5439 | 0.3633 | 0.7799 | 0.7579 | 0.8094 | 0.7694 | 0.4213 | |
| RES-L [14] | 0.7307 | 0.5704 | 0.6294 | 0.5677 | 0.3688 | 0.7892 | 0.7696 | 0.8169 | 0.7813 | 0.4045 | |
| MEGL | 0.7413 | 0.5689 | 0.6595 | 0.5800 | 0.3893 | 0.8025 | 0.7855 | 0.8246 | 0.7937 | 0.4195 | |
| ViT-B/16 | - | 0.7858 | 0.6460 | 0.6456 | 0.6352 | 0.3323 | 0.8854 | 0.8771 | 0.8981 | 0.8803 | 0.3351 |
| CDEP [45] | 0.8150 | 0.7014 | 0.7013 | 0.6911 | 0.3556 | 0.8836 | 0.8771 | 0.8880 | 0.8770 | 0.3582 | |
| HAICS [48] | 0.8178 | 0.6722 | 0.6813 | 0.6642 | 0.3443 | 0.8854 | 0.8784 | 0.8924 | 0.8807 | 0.3557 | |
| RES-G [14] | 0.8164 | 0.6864 | 0.7094 | 0.6833 | 0.3441 | 0.8796 | 0.8726 | 0.8851 | 0.8738 | 0.3604 | |
| RES-L [14] | 0.8206 | 0.6870 | 0.7296 | 0.6850 | 0.3401 | 0.8761 | 0.8712 | 0.8806 | 0.8677 | 0.3639 | |
| MEGL | 0.8317 | 0.7037 | 0.7485 | 0.7036 | 0.3521 | 0.8981 | 0.8897 | 0.9024 | 0.8921 | 0.3681 | |