notesum.ai
Published at October 22Correct after Answer: Enhancing Multi-Span Question Answering with Post-Processing Method
cs.SI
cs.AI
cs.DS
Released Date: October 22, 2024
Authors: Jiayi Lin1, Chenyang Zhang1, Haibo Tong1, Dongyu Zhang1, Qingqing Hong1, Bingxuan Hou1, Junli Wang1
Aff.: 1Tongji University, Key Laboratory of Embedded System and Service Computing, Shanghai 201804, China

| MultiSpanQA | MultiSpanQA-Expand | MAMRC | MAMRC-Multi | |||||||||
| EM P | EM R | EM F1 | EM P | EM R | EM F1 | EM P | EM R | EM F1 | EM P | EM R | EM F1 | |
| Discriminative Models (BERT-base) | ||||||||||||
| MTMSN | 51.76 | 41.69 | 46.18 | 60.88 | 51.46 | 55.78 | 72.65 | 77.41 | 74.96 | 71.50 | 76.71 | 74.01 |
| +ACC | 67.75 | 49.52 | 57.22 | 67.77 | 54.91 | 60.66 | 81.60 | 77.40 | 79.44 | 85.55 | 79.32 | 82.32 |
| MUSST | 61.44 | 53.74 | 57.33 | 67.48 | 59.71 | 63.36 | 76.28 | 79.00 | 77.62 | 75.68 | 78.12 | 76.88 |
| +ACC | 68.84 | 54.39 | 60.76 | 69.62 | 60.05 | 64.48 | 81.94 | 77.10 | 79.45 | 85.87 | 78.38 | 81.95 |
| Tagger | 56.66 | 65.46 | 60.74 | 52.81 | 55.92 | 54.30 | 77.15 | 81.83 | 79.42 | 74.71 | 76.74 | 75.70 |
| +ACC | 68.52 | 67.05 | 67.78 | 62.74 | 58.83 | 60.71 | 82.56 | 79.67 | 81.10 | 85.80 | 77.58 | 81.48 |
| SpanQualifier | 67.99 | 69.44 | 68.70 | 62.83 | 67.88 | 65.25 | 77.51 | 84.51 | 80.86 | 76.10 | 85.39 | 80.47 |
| +ACC | 72.04 | 67.82 | 69.86 | 65.78 | 67.13 | 66.45 | 82.40 | 80.76 | 81.57 | 85.67 | 83.37 | 84.51 |
| Discriminative Models (RoBERTa-base) | ||||||||||||
| MTMSN | 59.86 | 49.97 | 54.47 | 63.39 | 56.00 | 59.47 | 73.94 | 78.36 | 76.08 | 71.69 | 77.47 | 74.46 |
| +ACC | 71.75 | 55.87 | 62.82 | 68.95 | 58.81 | 63.48 | 81.84 | 77.70 | 79.72 | 85.13 | 79.82 | 82.39 |
| MUSST | 69.82 | 61.94 | 65.64 | 69.29 | 63.16 | 66.08 | 78.01 | 79.71 | 78.85 | 76.69 | 77.16 | 76.92 |
| +ACC | 73.07 | 61.78 | 66.96 | 70.54 | 62.60 | 66.33 | 82.75 | 77.57 | 80.08 | 86.10 | 77.48 | 81.56 |
| Tagger | 66.22 | 72.14 | 69.05 | 64.35 | 65.66 | 64.99 | 79.47 | 83.59 | 81.48 | 75.85 | 78.19 | 77.00 |
| +ACC | 72.39 | 72.12 | 72.26 | 68.70 | 66.21 | 67.43 | 83.62 | 81.80 | 82.70 | 85.77 | 78.36 | 81.90 |
| SpanQualifier | 70.40 | 72.82 | 71.58 | 64.65 | 69.65 | 66.99 | 83.40 | 80.83 | 82.10 | 75.63 | 85.77 | 80.37 |
| +ACC | 73.69 | 71.32 | 72.47 | 67.68 | 68.53 | 68.09 | 82.83 | 81.88 | 82.35 | 85.14 | 83.77 | 84.45 |
| Generative Models | ||||||||||||
| BART-base | 69.10 | 62.38 | 65.57 | 60.42 | 55.95 | 58.10 | 77.53 | 74.33 | 75.89 | 75.96 | 73.21 | 74.56 |
| +ACC | 73.90 | 61.80 | 67.31 | 63.68 | 55.70 | 59.43 | 80.47 | 72.47 | 76.26 | 81.26 | 71.22 | 75.91 |
| T5-base | 70.56 | 67.97 | 69.24 | 64.63 | 64.59 | 64.61 | 77.01 | 79.88 | 78.41 | 75.27 | 77.14 | 76.19 |
| +ACC | 73.93 | 66.20 | 69.85 | 67.43 | 63.32 | 65.31 | 80.79 | 77.43 | 79.07 | 80.65 | 74.73 | 77.58 |
| GPT3.5 (Zeroshot) | 64.83 | 60.86 | 62.78 | 39.60 | 53.68 | 45.58 | 45.45 | 57.34 | 50.71 | 57.00 | 63.27 | 59.97 |
| +ACC | 73.04 | 61.96 | 67.04 | 48.64 | 53.96 | 51.16 | 57.10 | 57.71 | 57.40 | 69.54 | 64.06 | 66.69 |
| GPT3.5 (Fewshot) | 68.94 | 68.18 | 68.56 | 42.44 | 58.13 | 49.06 | 58.42 | 73.79 | 65.21 | 65.38 | 76.68 | 70.58 |
| +ACC | 74.88 | 66.61 | 70.51 | 51.65 | 57.91 | 54.60 | 68.02 | 70.94 | 69.45 | 75.39 | 74.97 | 75.18 |