notesum.ai
Published at December 9Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy
cs.CL
Released Date: December 9, 2024
Authors: Min Zeng1, Caiquan Liu, Shiqi Zhang, Li Xie, Chen Sang, Xiaoxin Chen, Xiaoxin Chen
Aff.: 1vivo AI Lab

| Dataset | Baseline | Base-Model | Full-Data | Greedy | DQE |
|---|---|---|---|---|---|
| MR | 93.30 | 76.83 | 92.68 | 92.96 | 93.81 |
| CR | 93.54 | 80.85 | 94.95 | 94.95 | 95.48 |
| IMDb | 96.68 | 90.26 | 97.32 | 97.38 | 97.86 |
| SST-2 | 97.50 | 83.69 | 97.47 | 97.75 | 98.35 |
| SST-5 | 59.80 | 43.80 | 61.62 | 60.90 | 61.95 |
| AG News | 85.00 | 69.64 | 95.38 | 95.04 | 95.70 |