notesum.ai

Published at November 29

ForgerySleuth: Empowering Multimodal Large Language Models for Image Manipulation Detection

cs.CV

cs.LG

Released Date: November 29, 2024

Authors: Zhihao Sun¹, Haoran Jiang², Haoran Chen¹, Yixin Cao¹, Xipeng Qiu², Zuxuan Wu¹, Yu-Gang Jiang¹

Aff.: ¹Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University; ²School of Computer Science, Fudan University

Arxiv: http://arxiv.org/pdf/2411.19466v1

[Uncaptioned image]

Method	Optimal Threshold F1					Fixed Threshold F1
Method	Columbia	Coverage	CASIA1	NIST16	COCOGlide	Columbia	Coverage	CASIA1	NIST16	COCOGlide
Mantra-Net [55]	0.650	0.486	0.320	0.225	0.673	0.508	0.317	0.180	0.172	0.516
SPAN [20]	0.873	0.428	0.169	0.363	0.350	0.759	0.235	0.112	0.228	0.298
MVSS-Net [8]	0.781	0.659	0.650	0.372	0.642	0.729	0.514	0.528	0.320	0.486
PSCC-Net [31]	0.760	0.615	0.670	0.210	0.685	0.604	0.473	0.520	0.113	0.515
CAT-Net2 [25]	0.923	0.582	0.852	0.417	0.603	0.859	0.381	0.752	0.308	0.434
TruFor [15]	0.914	0.735	0.822	0.470	0.720	0.859	0.600	0.737	0.399	0.523
UnionFormer [28]	0.925	0.720	0.863	0.489	0.742	0.861	0.592	0.760	0.413	0.536
ForgerySleuth	0.931	0.792	0.870	0.610	0.751	0.925	0.684	0.804	0.518	0.562