notesum.ai
Published at November 29QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain
cs.CV
cs.LG
Released Date: November 29, 2024
Authors: Wenfang Sun1, Yingjun Du2, Gaowen Liu3, Cees G. M. Snoek2
Aff.: 1University of Science and Technology of China; 2University of Amsterdam; 3Cisco Research

| Photo | Painting | Cartoon | Sketch | Average | |||||||||||
| YOLO | CLIP-S | YOLO | CLIP-S | YOLO | CLIP-S | YOLO | CLIP-S | YOLO | CLIP-S | ||||||
| MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | ||||||
| SDXL [41] | 14.77 | 20.26 | 73.2 | 12.27 | 17.17 | 73.5 | 14.10 | 17.97 | 72.5 | 11.92 | 16.51 | 71.4 | 13.26 | 17.98 | 72.7 |
| w/ | 14.41 | 19.26 | 73.3 | 11.31 | 15.28 | 73.7 | 12.73 | 15.07 | 72.9 | 9.91 | 13.25 | 72.7 | 12.28 | 16.15 | 73.2 |
| w/ | 14.12 | 19.60 | 73.9 | 10.20 | 14.44 | 74.1 | 10.66 | 13.17 | 73.7 | 8.75 | 11.89 | 73.3 | 10.94 | 14.76 | 73.8 |
| Ours (w/ & ) | 10.09 | 13.98 | 74.3 | 8.04 | 11.04 | 75.9 | 9.48 | 11.92 | 74.8 | 9.15 | 11.69 | 74.1 | 9.19 | 12.16 | 74.8 |