notesum.ai
Published at December 3A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis
cs.CL
cs.AI
Released Date: December 3, 2024
Authors: Changzhi Zhou1, Dandan Song1, Yuhang Tian1, Zhijing Wu1, Hao Wang1, Xinyu Zhang1, Jun Yang1, Ziyi Yang1, Shuhao Zhang2
Aff.: 1Beijing Institute of Technology; 2Nanyang Technological University

| Methods | AE | OE | ALSC | AOE | AVG | |||||||||
| L14 | R14 | R15 | L14 | R14 | R15 | L14 | R14 | R15 | L14 | R14 | R15 | R16 | ||
| Full Fine-tuned SLMs with full dataset | ||||||||||||||
| Dual-MRC | 82.51 | 86.60 | 75.08 | / | / | / | 75.97 | 82.04 | 73.59 | 79.90 | 83.73 | 74.50 | 83.33 | / |
| DCRAN | 85.61 | 89.67 | 79.68 | 79.77 | 87.59 | 79.90 | 80.78 | 84.22 | 77.99 | / | / | / | / | / |
| BARTABSA | 83.52 | 87.07 | 75.48 | 77.86 | 87.29 | 76.49 | 76.76 | 75.56 | 73.91 | 80.55 | 85.38 | 80.52 | 87.92 | 80.64 |
| T5-Instruct | 84.05 | 87.51 | 77.22 | 80.91 | 86.55 | 78.64 | 78.29 | 82.82 | 86.27 | 81.04 | 85.55 | 84.92 | 89.12 | 83.30 |
| Efficiently Fine-tuned LLMs with full dataset | ||||||||||||||
| ChatGLM3-6B | 83.28 | 87.28 | 76.79 | 82.80 | 85.49 | 77.42 | 81.04 | 87.35 | 87.27 | 81.79 | 87.29 | 82.57 | 88.08 | 83.73 |
| QWen1.5-7B | 87.87 | 87.95 | 80.54 | 81.48 | 87.26 | 79.29 | 81.19 | 88.14 | 88.75 | 83.73 | 88.14 | 84.84 | 91.67 | 85.68 |
| Mistral-7B-v0.2 | 85.53 | 88.87 | 79.66 | 82.26 | 87.60 | 79.59 | 81.50 | 88.27 | 88.75 | 84.72 | 87.86 | 85.63 | 91.18 | 85.49 |
| LLaMA3-8B | 87.62 | 88.97 | 79.67 | 83.33 | 87.40 | 82.15 | 80.73 | 88.63 | 89.11 | 84.11 | 88.90 | 85.71 | 91.34 | 86.17 |
| API-based LLMs without fine-tuning (zero-/three-shot) | ||||||||||||||
| LLaMA3-70B | 53.96 | 72.28 | 56.11 | 65.88 | 74.35 | 58.34 | 77.37 | 84.12 | 85.71 | 64.52 | 74.43 | 67.43 | 76.16 | 70.05 |
| +Random | 54.33 | 73.71 | 55.94 | 66.44 | 77.44 | 56.82 | 79.20 | 85.58 | 86.16 | 63.76 | 75.11 | 69.53 | 78.03 | 70.93 |
| +BM25 | 59.87 | 78.03 | 64.66 | 68.55 | 80.56 | 62.24 | 78.13 | 85.22 | 85.79 | 67.29 | 74.28 | 68.39 | 77.32 | 73.10 |
| +SimCSE | 60.41 | 78.88 | 63.99 | 66.71 | 73.59 | 59.00 | 78.75 | 84.78 | 85.79 | 68.10 | 74.63 | 69.05 | 78.14 | 72.45 |
| ChatGPT | 50.91 | 68.38 | 50.92 | 55.89 | 71.34 | 51.83 | 74.48 | 81.23 | 82.56 | 59.91 | 67.73 | 66.15 | 70.50 | 65.53 |
| +Random | 51.37 | 70.53 | 54.06 | 61.05 | 74.18 | 56.10 | 76.14 | 83.47 | 84.06 | 62.91 | 72.09 | 67.77 | 74.52 | 68.33 |
| +BM25 | 54.60 | 72.31 | 55.98 | 61.13 | 75.35 | 55.02 | 76.30 | 81.87 | 83.81 | 63.29 | 70.38 | 68.17 | 73.64 | 68.60 |
| +SimCSE | 54.90 | 72.10 | 57.77 | 59.56 | 74.95 | 55.19 | 75.33 | 81.76 | 82.81 | 64.74 | 71.88 | 69.33 | 77.08 | 69.03 |
| GPT4+BM25 | 66.11 | 80.67 | 69.60 | 66.55 | 79.43 | 61.20 | 79.66 | 84.90 | 85.50 | 73.12 | 79.62 | 76.46 | 83.05 | 75.84 |