notesum.ai
Published at November 21MMGenBench: Evaluating the Limits of LMMs from the Text-to-Image Generation Perspective
cs.CV
cs.AI
cs.CL
Released Date: November 21, 2024
Authors: Hailang Huang1, Yong Wang2, Zixuan Huang1, Huaqiu Li3, Tongwen Huang2, Xiangxiang Chu2, Richong Zhang4
Aff.: 1Beihang University, Alibaba Group; 2Alibaba Group; 3Alibaba Group, Tsinghua University; 4Beihang University
![[Uncaptioned image]](https://arxiv.org/html/2411.14062v1/x1.png)
| Model | Test | Domain | ||
|---|---|---|---|---|
| SIM | FID | SIM | FID | |
| GPT-4o [33] | 0.566 | 1.306 | - | - |
| Qwen-VL-Max [7] | 0.552 | 1.363 | - | - |
| Qwen-VL-Plus [7] | 0.475 | 1.586 | - | - |
| Qwen2-VL-72B [38] | 0.553 | 1.357 | 0.545 | 0.710 |
| Qwen2-VL-7B [38] | 0.532 | 1.437 | 0.524 | 0.775 |
| Qwen2-VL-2B [38] | 0.487 | 1.549 | 0.501 | 0.806 |
| InternVL2-76B [11] | 0.599 | 1.264 | 0.599 | 0.632 |
| InternVL2-40B [11] | 0.566 | 1.350 | 0.566 | 0.696 |
| InternVL2-26B [11] | 0.576 | 1.320 | 0.577 | 0.671 |
| InternVL2-8B [11] | 0.547 | 1.403 | 0.548 | 0.701 |
| InternVL2-4B [11] | 0.556 | 1.345 | 0.556 | 0.689 |
| InternVL2-2B [11] | 0.476 | 1.563 | 0.483 | 0.848 |
| Ovis1.6-Gemma2-9B [31] | 0.582 | 1.316 | 0.579 | 0.667 |
| Ovis1.5-Gemma2-9B [31] | 0.521 | 1.487 | 0.524 | 0.808 |
| Ovis1.5-Llama3-8B [31] | 0.526 | 1.466 | 0.527 | 0.795 |
| LLaVA-OV-72B [25] | 0.494 | 1.561 | 0.491 | 0.872 |
| LLaVA-OV-SI-72B [25] | 0.512 | 1.512 | 0.514 | 0.813 |
| LLaVA-OV-7B [25] | 0.488 | 1.590 | 0.490 | 0.861 |
| LLaVA-OV-SI-7B [25] | 0.492 | 1.554 | 0.497 | 0.834 |
| MiniCPM-V2.6-8B [44] | 0.548 | 1.386 | 0.545 | 0.710 |
| RBDash-72B [3] | 0.525 | 1.413 | 0.527 | 0.740 |
| xGen-MM-4.4B [43] | 0.414 | 1.671 | 0.412 | 0.940 |
| MiniCPM-V2.5-8B [44] | 0.526 | 1.432 | 0.530 | 0.767 |