notesum.ai
Published at December 3Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generative Models
cs.CV
cs.AI
Released Date: December 3, 2024
Authors: Jungwon Park1, Jungmin Ko1, Dongnam Byun1, Jangwon Suh1, Wonjong Rhee1
Aff.: 1Seoul National University

| Method | Image Attribute | |||
|---|---|---|---|---|
| Image Style | Weather Conditions | |||
| CLIP | HP-score | CLIP | HP-score | |
| SDEdit (0.5) | 0.2938 | - | 0.2817 | - |
| SDEdit (0.7) | 0.3217 | 15.8 | 0.2908 | 39.5 |
| P2P | 0.3120 | 30.6 | 0.2788 | 33.9 |
| PnP | \ul0.3286 | 41.9 | \ul0.3046 | 35.0 |
| MassaCtrl | 0.2722 | - | 0.2524 | - |
| FPE | 0.3236 | 25.7 | 0.2962 | 35.5 |
| \hdashlineP2P-HRV (Ours) | 0.3424 | 100 | 0.3348 | 100 |