notesum.ai
Published at November 13Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle
cs.CL
cs.AI
cs.LG
Released Date: November 13, 2024
Authors: Hui Dai1, Ryan Teehan1, Mengye Ren1
Aff.: 1New York University

| LLM | K-Cutoff | Average Yearly Accuracy (%) | Average YoY Accuracy Change (%) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 2020 | 2021 | 2022 | 2023 | 2024 | Pre-Cutoff | Post-Cutoff | Avg | |||
| TF | Claude-3.5-Sonnet | Apr 2024 | 81.21 | 79.88 | 78.05 | 74.38 | 66.25 | -4.77 | -11.97 | -4.47 |
| GPT-4 | Apr 2023 | 69.68 | 66.41 | 60.36 | 60.54 | 57.88 | -5.83 | -1.96 | -4.21 | |
| GPT-3.5 | Sept 2021 | 62.86 | 60.12 | 59.36 | 57.11 | 57.80 | -4.33 | -3.43 | -2.11 | |
| Mixtral-8x7B | Unknown | 57.83 | 52.69 | 43.09 | 39.34 | 36.29 | – | – | -10.93 | |
| Mistral-7B | Unknown | 57.57 | 54.65 | 48.22 | 41.35 | 41.89 | – | – | -7.67 | |
| Llama-3-8B | Mar 2023 | 65.06 | 64.24 | 62.35 | 58.68 | 56.44 | -1.95 | -6.5 | -3.23 | |
| Qwen-2-7B | Unknown | 62.39 | 60.15 | 57.67 | 53.39 | 53.14 | – | – | -3.86 | |
| Gemma-2-2B | Jul 2024 | 58.71 | 59.31 | 57.64 | 56.61 | 55.87 | -1.41 | -5.28 | -0.97 | |
| MC | Claude-3.5-Sonnet | Apr 2024 | 76.86 | 77.67 | 74.32 | 69.37 | 61.79 | -6.26 | -12.82 | -4.83 |
| GPT-4 | Apr 2023 | 70.59 | 70.62 | 66.75 | 56.40 | 50.96 | -4.23 | -18.47 | -7.48 | |
| GPT-3.5 | Sept 2021 | 50.27 | 50.36 | 44.43 | 41.43 | 42.32 | 0.14 | -0.46 | -4.25 | |
| Mixtral-8x7B | Unknown | 57.38 | 56.97 | 50.76 | 47.10 | 45.09 | – | – | -5.37 | |
| Mistral-7B | Unknown | 50.07 | 52.36 | 48.06 | 44.40 | 44.08 | – | – | -2.56 | |
| Llama-3-8B | Mar 2023 | 52.44 | 54.18 | 50.66 | 47.94 | 45.97 | -2.21 | -1.25 | -3.01 | |
| Qwen-2-7B | Unknown | 55.28 | 55.93 | 53.44 | 49.77 | 47.94 | – | – | -3.04 | |
| Gemma-2-2B | Jul 2024 | 47.87 | 50.71 | 46.81 | 45.20 | 43.65 | -4.46 | -2.33 | -1.73 | |