notesum.ai
Published at December 6Can Large Language Models Serve as Effective Classifiers for Hierarchical Multi-Label Classification of Scientific Documents at Industrial Scale?
cs.AI
Released Date: December 6, 2024
Authors: Seyed Amin Tabatabaei1, Sarah Fancher2, Michael Parsons2, Arian Askari3
Aff.: 1Elsevier; 2SSRN; 3Leiden University

| Method | Accuracy% | S-5% ↑ | S-4% ↑ | S-3% ↑ | S-2% ↓ | S-1% ↓ |
|---|---|---|---|---|---|---|
| Machine learning based method | ||||||
| Previous SOTA Singh et al. (2022) | 61.5 | 00.0 | 11.5 | 50.0 | 30.7 | 7.8 |
| Only LLM-based method | ||||||
| Trav-Select (ours) | 50.0 | 4.3 | 14.3 | 25.7 | 22.9 | 32.9 |
| Bi-encoder followed by LLM-based methods | ||||||
| LLM-Rerank (ours) | 70.0 | 0.0 | 4.3 | 60 | 31.4 | 4.3 |
| LLM-SelectO (ours) | 58.6 | 4.3 | 24.3 | 25.7 | 28.6 | 17.1 |
| LLM-SelectP (ours) | 94.3 | 32.9 | 38.6 | 22.9 | 4.3 | 1.4 |
| Ablation analysis(Ours) | ||||||
| LLM-SelectP w/o decreasing (random selection) | 62.9 | 0.0 | 4.3 | 50.0 | 37.1 | 8.6 |
| LLM-SelectP w/o description | 85.7 | 2.9 | 15.7 | 60.0 | 18.6 | 2.9 |
| LLM-SelectP w/o contextualization | 85.7 | 2.9 | 28.6 | 57.1 | 7.1 | 4.3 |