notesum.ai

Published at November 7

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

cs.CV
cs.AI

Released Date: November 7, 2024

Authors: Jonathan Fhima1, Elad Ben Avraham2, Oren Nuriel2, Yair Kittenplon2, Roy Ganz2, Aviad Aberdam2, Ron Litman2

Aff.: 1Technion, Israel; 2AWS AI Labs

Arxiv: http://arxiv.org/abs/2411.04642v1