notesum.ai
Published at October 21Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping
cs.LG
cs.AI
Released Date: October 21, 2024
Authors: Ryan Li1, Yanzhe Zhang2, Diyi Yang1
Aff.: 1Stanford University; 2Georgia Tech

| Model | Prompting | Layout Sim. | Text IoU | Image IoU | Other IoU | Human Sat. |
| GPT-4o | Direct | 19.20 | 17.12 | 16.19 | 3.03 | 30.0 |
| Text-Augmented | 21.33 | 22.08 | 13.23 | 2.75 | - | |
| GPT-4o-Mini | Direct | 11.49 | 13.51 | 2.36 | 1.27 | 12.0 |
| Text-Augmented | 16.25 | 20.84 | 0.72 | 1.12 | - | |
| Claude-3-Opus | Direct | 12.86 | 10.43 | 12.67 | 0.65 | 10.0 |
| Text-Augmented | 17.11 | 18.09 | 8.32 | 2.97 | - | |
| Claude-3.5-Sonnet | Direct | 21.64 | 22.51* | 10.47 | 2.94 | 36.0 |
| Text-Augmented | 22.26 | 25.33* | 9.21 | 3.58 | - | |
| Claude-3-Sonnet | Direct | 11.97 | 10.61 | 10.09 | 0.73 | 0.0 |
| Text-Augmented | 14.22 | 15.85 | 6.62 | 1.72 | - | |
| Claude-3-Haiku | Direct | 10.25 | 12.61 | 3.15 | 1.17 | 6.0 |
| Text-Augmented | 17.52 | 20.60 | 2.72 | 2.22 | - | |
| Gemini-1.5-Pro | Direct | 18.25 | 16.44 | 14.69 | 1.12 | 22.0 |
| Text-Augmented | 18.72 | 19.46 | 11.79 | 0.96 | - | |
| Gemini-1.5-Flash | Direct | 14.15 | 13.28 | 8.77 | 0.03 | 8.0 |
| Text-Augmented | 15.22 | 13.25 | 7.81 | 0.16 | - | |
| InternVL2-8b | Direct | 10.08 | 11.28 | 6.13 | 0.00 | 2.0 |
| Text-Augmented | 4.01 | 4.89 | 1.41 | 0.60 | - | |
| Llava-1.6-8b | Direct | 6.68 | 6.91 | 3.43 | 0.36 | 0.0 |
| Text-Augmented | 8.00 | 9.26 | 1.95 | 0.57 | - |