notesum.ai
Published at November 29ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection
cs.CV
cs.LG
Released Date: November 29, 2024
Authors: Zhihao Sun1, Haoran Jiang2, Haoran Chen1, Yixin Cao1, Xipeng Qiu2, Zuxuan Wu1, Yu-Gang Jiang1
Aff.: 1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University; 2School of Computer Science, Fudan University
![[Uncaptioned image]](https://arxiv.org/html/2411.19466v1/x1.png)
| Method | Optimal Threshold F1 | Fixed Threshold F1 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Columbia | Coverage | CASIA1 | NIST16 | COCOGlide | Columbia | Coverage | CASIA1 | NIST16 | COCOGlide | |
| Mantra-Net [55] | 0.650 | 0.486 | 0.320 | 0.225 | 0.673 | 0.508 | 0.317 | 0.180 | 0.172 | 0.516 |
| SPAN [20] | 0.873 | 0.428 | 0.169 | 0.363 | 0.350 | 0.759 | 0.235 | 0.112 | 0.228 | 0.298 |
| MVSS-Net [8] | 0.781 | 0.659 | 0.650 | 0.372 | 0.642 | 0.729 | 0.514 | 0.528 | 0.320 | 0.486 |
| PSCC-Net [31] | 0.760 | 0.615 | 0.670 | 0.210 | 0.685 | 0.604 | 0.473 | 0.520 | 0.113 | 0.515 |
| CAT-Net2 [25] | 0.923 | 0.582 | 0.852 | 0.417 | 0.603 | 0.859 | 0.381 | 0.752 | 0.308 | 0.434 |
| TruFor [15] | 0.914 | 0.735 | 0.822 | 0.470 | 0.720 | 0.859 | 0.600 | 0.737 | 0.399 | 0.523 |
| UnionFormer [28] | 0.925 | 0.720 | 0.863 | 0.489 | 0.742 | 0.861 | 0.592 | 0.760 | 0.413 | 0.536 |
| ForgerySleuth | 0.931 | 0.792 | 0.870 | 0.610 | 0.751 | 0.925 | 0.684 | 0.804 | 0.518 | 0.562 |