notesum.ai
Published at November 6Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
cs.CV
cs.AI
cs.CL
cs.MM
Released Date: November 6, 2024
Authors: Dingjie Song1, Sicheng Lai1, Shunian Chen1, Lichao Sun2, Benyou Wang1
Aff.: 1The Chinese University of Hong Kong, Shenzhen; 2Lehigh University

| Model | COCO Validation Set | NoCaps Validation Set | Vintage Training Set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Metric | CR | PCR | IL | CR | PCR | IL | CR | PCR | IL | |||
| Open-source MLLMs | ||||||||||||
| LLaVA-1.5-7B | 34.6 | 34.0 | -0.6 | 19.0 | 30.9 | 28.5 | -2.4 | – | 10.8 | 10.1 | -0.7 | 9.0 |
| VILA1.5-3B | 19.1 | 20.5 | 1.4 | 13.0 | 19.1 | 20.5 | 1.4 | 13.0 | 1.5 | 2.2 | 0.7 | 1.5 |
| Qwen-VL-Chat | 32.2 | 30.3 | -1.9 | – | 28.7 | 27.3 | -1.4 | – | 15.1 | 15.4 | 0.3 | 12.4 |
| fuyu-8b | 9.6 | 10.6 | 1.0 | 7.8 | 10.0 | 9.8 | -0.2 | 8.3 | 2.4 | 3.3 | 0.9 | 2.3 |
| idefics2-8b | 43.5 | 42.3 | -1.2 | – | 42.6 | 37.5 | -5.1 | – | 18.5 | 17.0 | -1.5 | – |
| Phi-3-vision-128k-instruct | 38.8 | 39.3 | 0.5 | 19.4 | 36.9 | 33.3 | -3.6 | – | 17.4 | 11.7 | -5.7 | – |
| Yi-VL-6B | 43.9 | 43.3 | -0.6 | 19.4 | 37.2 | 36.1 | -1.1 | – | 3.3 | 4.2 | 0.9 | 2.8 |
| InternVL2-8B | 53.3 | 51.9 | -1.4 | – | 48.0 | 46.2 | -1.8 | – | 28.0 | 28.7 | 0.7 | 18.8 |
| Proprietary MLLMs | ||||||||||||
| GPT-4o | 58.1 | 54.4 | -3.7 | – | 54.2 | 55.1 | 0.9 | 19.4 | 36.3 | 38.4 | 2.1 | 20.1 |
| Gemini-1.5-Pro | 57.5 | 55.3 | -2.2 | – | 51.2 | 52.0 | 0.8 | 18.7 | – | – | – | – |
| Claude-3.5-Sonnet | 53.7 | 51.0 | -2.7 | – | 50.8 | 51.5 | 0.7 | 20.0 | 35.2 | 33.0 | -2.2 | 21.3 |