notesum.ai
Published at December 9Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
cs.CV
Released Date: December 9, 2024
Authors: Qifan Yu1, Zhebei Shen1, Zhongqi Yue2, Yang Wu3, Wenqiao Zhang1, Yunfei Li3, Juncheng Li1, Siliang Tang1, Yueting Zhuang1
Aff.: 1Zhejiang University; 2Nanyang Technological University; 3Alibaba Group

| Methods | Valid Data | MLLM Benchmarks | VQA Benchmarks | Captioning | |||||||
| MME-P | MME-C | SEED-Bench-I | POPE | VizWiz | ScienceQA | GQA | VQA-v2 | TextVQA | NoCaps (val) | ||
| MiniGPT4-Instruction | Model: MiniGPT4-7B | ||||||||||
| MiniGPT4-7B | 3.4k | 717.37 | 259.55 | 23.8 | 68.3 | 36.0 | 36.3 | 32.2 | 32.1 | 21.4 | 111.5 |
| Random | 0.2k | 698.43 | 227.66 | 25.1 | 69.7 | 18.3 | 34.0 | 19.2 | 33.2 | 17.2 | 105.1 |
| Length | 0.2k | 683.39 | 209.55 | 26.7 | 69.8 | 29.9 | 35.6 | 32.5 | 33.7 | 17.4 | 106.5 |
| E2LN [42] | 0.2k | 668.55 | 207.64 | 26.5 | 72.0 | 41.9 | 36.1 | 32.9 | 36.3 | 23.7 | 108.3 |
| IFD [30] | 0.2k | 678.61 | 213.75 | 29.1 | 47.4 | 42.7 | 38.1 | 28.3 | 36.0 | 23.4 | 106.6 |
| InsTag [37] | 0.2k | 715.64 | 237.86 | 26.8 | 70.4 | 40.0 | 38.1 | 30.1 | 34.5 | 22.2 | 105.9 |
| LESS [54] | 0.2k | 698.47 | 191.36 | 22.4 | 71.8 | 38.4 | 35.4 | 26.0 | 34.4 | 16.6 | 109.7 |
| InstructionGPT-4 [51] | 0.2k | 716.94 | 229.64 | 17.4 | 71.6 | 29.9 | 35.1 | 26.8 | 34.8 | 22.1 | 106.8 |
| SELF-FILTER [53] | 0.5k | 438.73 | 128.57 | 21.7 | 71.4 | 41.3 | 35.7 | 30.4 | 35.0 | 22.0 | 105.6 |
| TIVE [36] | 0.2k | 707.02 | 200.86 | 23.6 | 72.3 | 31.4 | 33.8 | 26.4 | 35.1 | 17.5 | 108.9 |
| DataTailor (Ours) | 0.2k | 720.63 | 263.93 | 27.3 | 69.8 | 40.8 | 37.7 | 30.7 | 34.7 | 21.0 | 106.9 |
| LLaVA-1.5-mix-665k | Model: LLaVA-7B | ||||||||||
| LLaVA-v1.5-7B (LoRA) | 665k | 1476.90 | 267.90 | 67.4 | 86.4 | 47.8 | 70.0 | 63.0 | 79.1 | 58.2 | 106.5 |
| Random | 50k | 1387.45 | 287.50 | 59.7 | 85.7 | 42.3 | 70.0 | 55.0 | 73.7 | 53.1 | 107.7 |
| Length | 50k | 1356.96 | 265.71 | 47.0 | 82.6 | 49.2 | 60.9 | 55.5 | 70.7 | 45.2 | 88.2 |
| E2LN [42] | 50k | 1077.31 | 252.50 | 59.3 | 80.8 | 44.4 | 71.0 | 41.7 | 61.0 | 41.7 | 86.9 |
| GradN [42] | 50k | 1275.44 | 303.57 | 58.3 | 75.7 | 37.8 | 70.9 | 44.9 | 64.0 | 46.0 | 101.9 |
| IFD [30] | 50k | 1113.44 | 301.79 | 55.1 | 76.7 | 48.7 | 48.2 | 41.9 | 64.2 | 43.6 | 106.8 |
| InsTag [37] | 50k | 1317.14 | 345.00 | 57.4 | 82.1 | 47.4 | 69.3 | 52.5 | 63.2 | 53.3 | 108.3 |
| LESS [54] | 50k | 1344.80 | 281.80 | 61.2 | 79.4 | 44.4 | 71.0 | 53.4 | 71.8 | 52.0 | 106.2 |
| SELF-FILTER [53] | 25k | 955.65 | 262.50 | 47.5 | 76.0 | 40.8 | 59.4 | 3.6 | 2.1 | 5.6 | 82.3 |
| TIVE [36] | 50k | 1334.80 | 248.57 | 62.2 | 85.9 | 45.1 | 71.4 | 56.2 | 73.8 | 51.1 | 96.0 |
| DataTailor (Ours) | 50k | 1461.23 | 362.50 | 61.7 | 82.1 | 46.3 | 70.9 | 57.7 | 75.0 | 53.1 | 107.2 |
| DataTailor w/ Increased Ratio (Ours) | 100k | 1476.15 | 319.29 | 63.6 | 85.3 | 49.5 | 71.0 | 60.5 | 76.7 | 55.7 | 108.7 |