notesum.ai
Published at December 3ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?
cs.AI
cs.CL
cs.CV
Released Date: December 3, 2024
Authors: Leixin Zhang1, Steffen Eger, Yinjie Cheng, Weihe Zhai, Jonas Belouadi, Christoph Leiter, Simone Paolo Ponzetto, Fahimeh Moafian, Zhixue Zhao
Aff.: 1University of Twente

| Model | Correctness | Relevance |
|
|
||||
|---|---|---|---|---|---|---|---|---|
| Automatikz | 2.05 | 2.31 | 3.35 | 0.04 | ||||
| Llama_tikz | 1.78 | 1.94 | 2.61 | 0.29 | ||||
| GPT-4o_tikz | 3.50 | 3.67 | 3.75 | 0.09 | ||||
| Llama_python | 2.10 | 2.54 | 3.18 | 0.28 | ||||
| GPT-4o_python | 3.51 | 3.40 | 3.93 | 0.07 | ||||
| Stable Diffusion | 2.19 | 2.09 | 1.96 | - | ||||
| DALL·E | 2.16 | 2.00 | 1.55 | - |