notesum.ai
Published at December 10TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation
cs.CL
Released Date: December 10, 2024
Authors: Alfredo Garrachón Ruiz1, Tomás de la Rosa1, Daniel Borrajo1
Aff.: 1AI Research, JPMorganChase

| Bart | T5 | Qwen2.5-0.5 | Qwen2.5-1.5 | Distilled Text | |
| Precision (%) | 91.50(6.98) | 93.35(6.10) | 88.28(8.27) | 90.94(7.21) | 96.85(14.92) |
| Recall (%) | 86.18(8.67) | 87.14(8.60) | 88.93(7.67) | 92.04(6.53) | 8.61(10.72) |
| F1 (%) | 88.55(6.92) | 89.97(6.67) | 88.44(7.13) | 91.36(6.11) | 14.25(15.76) |
| SacreBLEU (%) | 85.09(8.10) | 87.95(7.57) | 86.76(7.80) | 88.96(7.04) | 49.16(11.59) |
| METEOR (%) | 93.69(4.33) | 94.65(4.00) | 94.66(4.24) | 95.59(3.62) | 75.42(6.85) |
| ROUGE-1 (%) | 95.99(2.51) | 96.66(2.31) | 96.00(2.81) | 96.64(2.44) | 85.86(4.10) |
| ROUGE-L(%) | 94.95(3.18) | 95.85(2.88) | 94.95(3.47) | 95.94(2.94) | 85.86(4.10) |
| Cosine Similarity (%) | 99.99(0.01) | 100.00(00.01) | 99.99(00.02) | 99.99(00.02) | 99.92(00.04) |
| Perplexity | 19.35(6.83) | 17.04(6.10) | 18.42(6.68) | 17.59(6.23) | 99.77(61.36) |
| Perplexity Original | 14.58(4.83) | 14.58(4.83) | 14.58(4.83) | 14.58(4.83) | 14.58(4.83) |
| Saved Tokens (%) | 20.58(5.60) | 20.58(5.60) | 20.58(5.60) | 20.58(5.60) | |
| Training Time (h) | 1.5 | 3 | 7.5 | 18 | |
| Parameters (B) | 0.139 | 0.223 | 0.5 | 1.5 |