notesum.ai

Published at December 9

Visual Lexicon: Rich Image Features in Language Space

cs.CV

cs.AI

cs.LG

Released Date: December 9, 2024

Authors: XuDong Wang¹, Xingyi Zhou¹, Alireza Fathi¹, Trevor Darrell², Cordelia Schmid¹

Aff.: ¹Google DeepMind; ²UC Berkeley

Arxiv: http://arxiv.org/pdf/2412.06774v1

[Uncaptioned image]

	DeDiffusion				DALL $\cdot$ E 3
	layout	semantic	style		layout	semantic	style
vs. ViLex ( $\uparrow$ )	98%	95%	98%		91%	76%	90%