notesum.ai
Published at December 6Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model
cs.CV
Released Date: December 6, 2024
Authors: Keunwoo Peter Yu1, Achal Dave2, Rares Ambrus2, Jean Mercat2
Aff.: 1University of Michigan; 2Toyota Research Institute

| 128 frames | 64 frames | 32 frames | 16 frames | |
|---|---|---|---|---|
| 128 | 34.25 | N/A | N/A | N/A |
| 64 | 39.22 | 30.59 | N/A | N/A |
| 32 | 35.58 | 34.17 | 21.23 | N/A |
| 16 | 33.39 | 33.85 | 21.47 | 33.83 |
| 8 | 45.28 | 33.09 | 32.10 | 43.43 |
| 4 | 37.05 | 36.97 | 24.61 | 38.38 |
| .37 | .81 | .53 | .47 |