notesum.ai
Published at October 22Influential Language Data Selection via Gradient Trajectory Pursuit
cs.RO
cs.AI
Released Date: October 22, 2024
Authors: Zhiwei Deng1, Tao Li1, Yang Li1
Aff.: 1Google DeepMind

| Method | 5% | 10% | 15% | 20% | Full data |
|---|---|---|---|---|---|
| Samples | 500 | 1000 | 1500 | 2000 | 10000 |
| Random | 42.7 (2.8) | 61.3 (3.5) | 71.7 (3.1) | 80.5 (3.1) | 85.6 (2.6) |
| LESS (top-) | 38.9 (1.7) | 41.6 (1.2) | 42.4 (2.2) | 43.2 (0.9) | |
| G-DIG | 48.8 (4.2) | 58.4 (3.5) | 70.1 (1.9) | 75.6 (2.1) | |
| RDS (representation-based) | 37.3 (1.8) | 63.0 (2.1) | 74.9 (2.4) | 80.8 (1.2) | |
| GTP - full (ours) | 52.7 (3.0) | 72.1 (2.8) | 77.2 (2.4) | 82.1 (2.9) | |
| (improvement) over random | 10.0% | 10.8% | 5.5% | 1.6% | |
| (improvement) over top- | 13.8% | 30.5% | 34.8% | 38.9% | |
| GTP - distributed (5 machines) | 50.4 (2.9) | 71.6 (2.8) | 76.2 (1.9) | 81.6 (1.7) | |
| (improvement) over random | 6.6% | 10.3% | 4.5% | 0.8% | |
| (improvement) over top- | 10.4% | 30.0% | 33.8% | 38.1% |