notesum.ai
Published at November 25Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
cs.CV
Released Date: November 25, 2024
Authors: Zhichao Zhang1, Wei Sun1, Xinyue Li1, Yunhao Li1, Qihang Ge1, Jun Jia1, Zicheng Zhang1, Zhongpeng Ji2, Fengyu Sun2, Shangling Jui2, Xiongkuo Min1, Guangtao Zhai1
Aff.: 1Shanghai Jiao Tong University, Shanghai, China; 2Huawei Technologies, Shanghai, China

| Models | Occurrences / Distortions | |||||
| Face | Arms | Torso | Legs | Feet | Avg | |
| LLaMAVID (7B) [47] | 91.9 / 56.6 | 82.9 / 62.1 | 81.8 / 46.5 | 37.2 / 30.7 | 25.1 / 41.5 | 63.8 / 47.7 |
| VideoChatGPT (7B) [56] | 73.9 / 71.7 | 79.7 / 76.2 | 75.2 / 37.6 | 38.8 / 23.0 | 29.2 / 19.4 | 59.4 / 45.6 |
| VideoLLaMA2 (7B) [11] | 22.5 / 24.9 | 14.1 / 14.9 | 41.2 / 61.2 | 61.9 / 75.8 | 70.4 / 78.3 | 42.2 / 51.0 |
| VILA1.5 (7B) [51] | 22.4 / 36.6 | 20.1 / 22.5 | 16.5 / 60.7 | 60.8 / 74.7 | 71.3 / 76.8 | 38.2 / 54.3 |
| NeXT-Video (7B) [44] | 62.5 / 37.6 | 56.8 / 22.5 | 78.6 / 60.9 | 58.2 / 75.0 | 65.3 / 77.6 | 64.3 / 54.7 |
| OneVison (7B) [42] | 61.9 / 29.8 | 71.9 / 19.8 | 64.6 / 62.2 | 52.2 / 77.0 | 67.2 / 80.3 | 63.6 / 53.8 |
| Qwen2-VL (7B) [83] | 63.1 / 44.3 | 57.9 / 25.7 | 71.0 / 60.4 | 62.6 / 76.1 | 74.1 / 79.9 | 70.3 / 57.3 |
| GPT-4o [61] | 92.4 / 29.7 | 89.7 / 19.3 | 85.1 / 61.6 | 62.1 / 77.4 | 76.1 / 80.1 | 80.0 / 53.6 |
| GPT-4o-mini [61] | 90.6 / 28.4 | 86.5 / 20.1 | 82.7 / 60.4 | 62.6 / 75.3 | 74.0 / 77.3 | 78.6 / 52.3 |
| GHVQ (proposed) | 92.6 / 72.3 | 87.1 / 80.6 | 87.1 / 63.4 | 63.5 / 78.1 | 77.2 / 81.5 | 81.5 / 75.2 |