notesum.ai
Published at November 20VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
cs.CV
cs.AI
cs.CL
cs.MM
Released Date: November 20, 2024
Authors: Ziyang Luo1, Haoning Wu1, Dongxu Li1, Jing Ma2, Mohan Kankanhalli3, Junnan Li1
Aff.: 1Rhymes AI; 2Hong Kong Baptist University; 3National University of Singapore
![[Uncaptioned image]](https://arxiv.org/html/2411.13281v1/x1.png)
| Models | Size | ELO | Win Rates | (, ] | (, ] | (, ] | (, ] |
| Proprietary Models | |||||||
|
|
- | 1505.69 | 89.19 | 1447.86 | 1449.59 | 1575.34 | 1552.23 |
|
|
- | 1323.25 | 76.90 | 1293.27 | 1343.28 | 1327.75 | 1349.29 |
|
|
- | 1187.01 | 65.11 | 1247.65 | 1171.82 | 1263.58 | 1291.64 |
|
|
- | 1149.52 | 62.07 | 1081.58 | 1131.27 | 1140.07 | 1260.36 |
| Open-Source Models | |||||||
|
|
83.5B | 1119.99 | 59.54 | 1147.45 | 1273.77 | 1110.67 | 1111.40 |
|
|
72B | 886.52 | 35.61 | 985.46 | 928.23 | 829.65 | 826.56 |
|
|
7B | 875.56 | 34.90 | 969.28 | 859.33 | 850.30 | 829.21 |
|
|
72B | 836.62 | 30.25 | 796.90 | 850.12 | 827.88 | 782.55 |
|
|
7B | 765.61 | 23.52 | 672.35 | 736.14 | 759.15 | 721.78 |
|
|
72B | 763.71 | 23.11 | 731.50 | 710.64 | 759.29 | 741.80 |
|
|
7B | 586.52 | 9.86 | 626.70 | 545.82 | 556.31 | 533.18 |