notesum.ai
Published at November 26ER2Score: LLM-based Explainable and Customizable Metric for Assessing Radiology Reports with Reward-Control Loss
cs.CL
cs.AI
Released Date: November 26, 2024
Authors: Yunyi Liu1, Yingshu Li1, Zhanyu Wang1, Xinyu Liang2, Lingqiao Liu3, Lei Wang4, Luping Zhou1
Aff.: 1The University of Sydney; 2Guangzhou University of Chinese Medicine; 3The University of Adelaide; 4The University of Wollongong

| Metric | Kendall’s Tau(P-Value) | Spearman (P-Value) |
|---|---|---|
| BLEU-4 [3] | 0.345 (2.2e-12) | 0.475 (1.2e-12) |
| ROUGE-L [17] | 0.491 (2.9e-23) | 0.663 (1.2e-26) |
| METEOR [18] | 0.464 (8.4e-21) | 0.627 (2.8e-23) |
| CIDEr [24] | 0.499 (4.5e-24) | 0.664 (8.9e-27) |
| BertScore [19] | 0.507 (4.5e-25) | 0.677 (3.9e-28) |
| RadGraphF1 [7] | 0.516 (4.3e-25) | 0.702 (4.4e-31) |
| semb_score [9] | 0.494 (1.0e-23) | 0.665 (6.2e-27) |
| RadCliQ-v1 [9] | 0.631 (6.9e-38) | 0.816 (6.6e-49) |
| GREEN [16] | 0.640 | - |
| ER2Score (Ours) | 0.751 (4.0e-52) | 0.910 (5.0e-76) |