notesum.ai
Published at November 8Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
cs.CL
cs.AI
cs.IR
cs.LG
Released Date: November 8, 2024
Authors: Mubashara Akhtar1, Michael Schlichtkrull2, Andreas Vlachos3
Aff.: 1Department of Informatics, King's College London; 2School of Electronic Engineering and Computer Science, Queen Mary University of London; 3Department of Computer Science and Technology, University of Cambridge

| Scorer | Coverage | Coherence | Repetition | Consistency | Relevance | Verdict Agree | Avg | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Prompt Scorer GPT | ||||||||||||||
| Ref-less | .237 | .261 | .201 | .254 | -.006 | -.009 | .112 | .118 | .292 | .360 | .240 | .243 | .179 | .205 |
| Ref-based (prec) | .277 | .270 | .185 | .191 | -.089 | -.090 | .218 | .242 | .316 | .313 | .278 | .256 | .198 | .197 |
| Ref-based (recall) | .339 | .334 | .199 | .197 | -.140 | -.105 | .187 | .190 | .277 | .269 | .314 | .284 | .196 | .195 |
| Proxy-ref | .253 | .306 | .142 | .201 | .007 | -.011 | .133 | .140 | .182 | .221 | .523 | .483 | .207 | .223 |
| Prompt Scorer Gemini-Pro | ||||||||||||||
| Ref-less | .287 | .296 | .208 | .246 | .012 | .010 | .092 | .110 | .203 | .263 | .262 | .273 | .177 | .200 |
| Ref-based (prec) | .304 | .254 | .186 | .163 | -.094 | -.115 | .193 | .197 | .270 | .264 | .237 | .177 | .183 | .157 |
| Ref-based (recall) | .398 | .380 | .282 | .268 | -.168 | -.164 | .199 | .205 | .248 | .248 | .285 | .278 | .207 | .203 |
| Prompt Scorer Gemini-Flash | ||||||||||||||
| Ref-less | .275 | .287 | .256 | .269 | .027 | .019 | .147 | .146 | .287 | .333 | .230 | .247 | .204 | .217 |
| Ref-based (prec) | .086 | .232 | .105 | -.004 | -.481 | -.450 | .337 | .181 | .490 | .368 | -.290 | -.205 | .041 | .020 |
| Ref-based (recall) | .132 | .270 | .103 | .106 | -.577 | -.619 | .330 | .291 | .527 | .531 | -.190 | -.039 | .054 | .090 |
| Prompt Scorer Llama 3.1 | ||||||||||||||
| Ref-less | .297 | .286 | .222 | .210 | -.011 | -.008 | .134 | .116 | .227 | .250 | .215 | .208 | .150 | .197 |
| Ref-based (prec) | .341 | .336 | .178 | .181 | -.120 | -.122 | .080 | .111 | .131 | .172 | .216 | .210 | .138 | .148 |
| Ref-based (recall) | .299 | .313 | .174 | .179 | -.060 | -.048 | .148 | .143 | .175 | .187 | .128 | .121 | .144 | .149 |
| Trained Scorer | ||||||||||||||
| Ref-based | .116 | .033 | .091 | .063 | -.133 | .024 | .057 | .059 | .145 | .110 | .151 | .098 | .055 | .063 |
| Proxy-ref | .338 | .348 | .230 | .286 | -.011 | -.057 | .293 | .329 | .298 | .374 | .487 | .459 | .265 | .275 |
| Baselines | ||||||||||||||
| RougeL | .150 | .169 | .180 | .190 | .057 | .040 | .124 | .131 | .086 | .099 | .070 | .097 | .103 | .119 |
| BLEU | .236 | .184 | .180 | .166 | -.144 | -.039 | .040 | .038 | .107 | .079 | .166 | .137 | .094 | .072 |
| Meteor | .229 | .240 | .192 | .191 | -.150 | -.132 | .061 | .064 | .062 | .076 | .152 | .159 | .085 | .100 |
| H-METEOR | .005 | -.024 | .076 | .057 | .117 | .025 | .039 | .024 | .008 | .003 | -.029 | -.019 | .034 | .003 |