notesum.ai

Published at December 3

ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?

cs.AI

cs.CL

cs.CV

Released Date: December 3, 2024

Authors: Leixin Zhang¹, Steffen Eger, Yinjie Cheng, Weihe Zhai, Jonas Belouadi, Christoph Leiter, Simone Paolo Ponzetto, Fahimeh Moafian, Zhixue Zhao

Aff.: ¹University of Twente

Arxiv: http://arxiv.org/pdf/2412.02368v1

Model

Correctness

Relevance

Scientific

style

Compile

Error Rate

Automatikz

2.05

2.31

3.35

0.04

Llama_tikz

1.78

1.94

2.61

0.29

GPT-4o_tikz

3.50

3.67

3.75

0.09

Llama_python

2.10

2.54

3.18

0.28

GPT-4o_python

3.51

3.40

3.93

0.07

Stable Diffusion

2.19

2.09

1.96

DALL·E

2.16

2.00

1.55