notesum.ai
Published at December 4AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
cs.CV
Released Date: December 4, 2024
Authors: Shouwei Ruan1, Hanqin Liu, Yao Huang, Xiaoqi Wang, Caixin Kang, Hang Su, Yinpeng Dong, Xingxing Wei
Aff.: 1Institute of Artificial Intelligence, Beihang University

| Target Models | #Params | ImagNetΒ [17] | Synthesis | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Clean | Random | Clean | Random | ||||||
| OpenCLIPΒ ViT-B/16Β [24] | 149M | 98.0 | 62.6Β | 54.0Β | 18.0Β | 98.0 | 89.9Β | 86.0Β | 62.3Β |
| OpenCLIPΒ ViT-L/14Β [24] | 428M | 94.4 | 61.7Β | 50.9Β | 15.3Β | 98.4 | 89.2Β | 83.7Β | 62.3Β |
| OpenCLIPΒ ViT-G/14Β [24] | 2.5B | 96.4 | 63.5Β | 53.5Β | 18.7Β | 98.4 | 89.4Β | 86.0Β | 62.7Β |
| BLIPΒ ViT-B/16Β [28] | 583M | 83.0 | 56.0Β | 51.3Β | 17.3Β | 92.7 | 80.4Β | 78.6Β | 54.7Β |