notesum.ai
Published at December 6Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies
cs.CL
Released Date: December 6, 2024
Authors: Recep Firat Cekinel, Pinar Karagoz, Cagri Coltekin

| MOCHEG | FACTIFY2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Models | Inputs | Support | Refute | NEI | F1-macro | Support | Refute | NEI | F1-macro |
| Qwen-7B | text | 0.533 | 0.262 | 0.169 | 0.321 | 0.524 | 0.458 | 0.281 | 0.421 |
| Mistral-7B | text | 0.505 | 0.281 | 0.216 | 0.334 | 0.575 | 0.561 | 0.093 | 0.409 |
| Gemma-2b | text | 0.610 | 0.462 | 0.315 | 0.462 | 0.562 | 0.119 | 0.083 | 0.255 |
| Qwen-VL | text + image | 0.168 | 0.472 | 0.186 | 0.275 | 0.463 | 0.460 | 0.369 | 0.431 |
| Idefics2-8b | text + image | 0.619 | 0.547 | 0.385 | 0.517 | 0.586 | 0.644 | 0.303 | 0.511 |
| PaliGemma-3b | text + image | 0.222 | 0.347 | 0.449 | 0.339 | 0.149 | 0.139 | 0.186 | 0.158 |
| LVLM4FV | text | 0.575 | 0.542 | 0.439 | 0.519 | 0.593 | 0.581 | 0.560 | 0.578 |
| LVLM4FV | text + image | 0.578 | 0.569 | 0.457 | 0.535 | 0.678 | 0.605 | 0.508 | 0.597 |
| MOCHEG | text + image | 0.490 | 0.604 | 0.282 | 0.459 | 0.547 | 0.621 | 0.275 | 0.481 |