notesum.ai
Published at November 21FoPru: Focal Pruning for Efficient Large Vision-Language Models
cs.CV
cs.AI
Released Date: November 21, 2024
Authors: Lei Jiang1, Weizhe Huang1, Tongxuan Liu1, Yuting Zeng1, Jing Li1, Lechao Cheng2, Xiaohua Xu1
Aff.: 1University of Science and Technology of China; 2Hefei University of Technology

| Model | Ratio | Accuracy Performance | Inference Efficiency | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ai2D | GQA | MMMU | SQA | POPE | TextVQA | Ocrbench | TTFT | TPOT | GPU | ||||
| (ms | ) | (ms/tok. | ) | GB | |||||||||
| LLaVA- NeXT-8B | 100% | 71.66 | 65.38 | 40.22 | 79.44 | 87.84 | 65.43 | 54.90 | 94 | - | 25.61 | - | 17.88 |
| 75% | 70.69 | 65.21 | 39.78 | 79.91 | 87.87 | 64.14 | 53.20 | 88 | 1.18x | 25.40 | 1.01x | 17.33 | |
| 50% | 70.02 | 64.82 | 39.67 | 79.39 | 87.13 | 62.86 | 49.50 | 57 | 1.66x | 24.80 | 1.03x | 16.98 | |
| 25% | 68.01 | 63.00 | 39.22 | 79.27 | 86.88 | 61.24 | 45.90 | 52 | 1.83x | 24.05 | 1.07x | 16.98 | |
| LLaVA- 1.6-7B | 100% | 66.58 | 64.24 | 35.10 | 73.21 | 87.61 | 64.90 | 52.20 | 88 | - | 23.70 | - | 16.15 |
| 75% | 65.54 | 64.13 | 37.00 | 73.19 | 87.93 | 63.00 | 51.30 | 81 | 1.09x | 23.55 | 1.01x | 15.44 | |
| 50% | 64.83 | 63.83 | 37.33 | 72.91 | 87.93 | 63.01 | 47.70 | 60 | 1.47x | 23.05 | 1.03x | 14.81 | |
| 25% | 64.35 | 62.26 | 36.67 | 72.41 | 86.83 | 60.81 | 44.60 | 50 | 1.78x | 21.97 | 1.08x | 14.57 | |
| LLaVA- 1.6-13B | 100% | 70.30 | 65.37 | 35.90 | 75.85 | 87.56 | 67.10 | 55.10 | 198 | - | 38.30 | - | 29.53 |
| 75% | 69.56 | 65.43 | 36.44 | 75.95 | 87.78 | 66.03 | 53.90 | 153 | 1.29x | 36.06 | 1.06x | 28.53 | |
| 50% | 68.98 | 65.15 | 37.11 | 76.23 | 87.84 | 64.33 | 50.10 | 116 | 1.70x | 33.35 | 1.15x | 27.53 | |
| 25% | 67.81 | 63.41 | 37.56 | 76.00 | 86.71 | 62.58 | 46.30 | 79 | 2.52x | 30.94 | 1.24x | 26.54 | |