| Model |
Context |
F1 |
MCC |
Recall |
Total Observations |
| RQ1 |
RQ3 |
RQ1 |
RQ3 |
RQ1 |
RQ2 |
RQ1 |
RQ2&3 |
| GPT-4o |
|
0.8443 |
0.5839 |
0.6971 |
0.5928 |
0.7746 |
0.7955 |
142 |
44 |
|
0.4316 |
0.3823 |
0.6290 |
142 |
62 |
|
0.8731 |
0.5562 |
0.7477 |
0.5689 |
0.8451 |
0.8636 |
142 |
44 |
|
0.4919 |
0.4805 |
0.7580 |
142 |
62 |
| LLaMA-405B-Instruct |
|
0.8163 |
0.5219 |
0.6379 |
0.5457 |
0.7606 |
0.7727 |
142 |
44 |
|
0.4271 |
0.4252 |
0.5806 |
142 |
62 |
|
0.8519 |
0.5251 |
0.7060 |
0.5515 |
0.8169 |
0.7727 |
142 |
44 |
|
0.4729 |
0.4855 |
0.6935 |
142 |
62 |
| GPT-4o-mini |
|
0.8229 |
0.5219 |
0.6558 |
0.5606 |
0.7465 |
0.6136 |
142 |
44 |
|
0.4729 |
0.4874 |
0.5000 |
142 |
62 |
|
0.8871 |
0.5306 |
0.7774 |
0.5659 |
0.8451 |
0.6818 |
142 |
44 |
|
0.4754 |
0.4980 |
0.6452 |
142 |
62 |
| LLaMA-70B-Instruct |
|
0.8351 |
0.5158 |
0.7016 |
0.5230 |
0.7042 |
0.5227 |
142 |
44 |
|
0.4430 |
0.4368 |
0.5806 |
142 |
62 |
|
0.7980 |
0.5333 |
0.6370 |
0.5505 |
0.6479 |
0.5682 |
142 |
44 |
|
0.4760 |
0.4895 |
0.6452 |
142 |
62 |
|
Note: For RQ1, the contexts are not used, hence the identical values for both contexts.
|