notesum.ai
Published at October 24Context is Key: A Benchmark for Forecasting with Essential Textual Information
cs.HC
cs.AI
Released Date: October 24, 2024
Authors: Andrew Robert Williams, Arjun Ashok1, Étienne Marcotte, Valentina Zantedeschi2, Jithendaraa Subramanian3, Roland Riachi4, James Requeima5, Alexandre Lacoste6, Irina Rish7, Nicolas Chapados8, Alexandre Drouin9
Aff.: 1ServiceNow Research, Mila - Quebec AI Institute, Universite de Montreal; 2ServiceNow Research, Universite Laval; 3ServiceNow Research, Mila - Quebec AI Institute, McGill University; 4Mila - Quebec AI Institute; 5University of Toronto; 6ServiceNow Research; 7Mila - Quebec AI Institute, Universite de Montreal; 8ServiceNow Research, Mila - Quebec AI Institute, Polytechnique Montreal; 9ServiceNow Research, Mila - Quebec AI Institute, Universite Laval

| Average RCRPS | Average Rank | Instruction Following | Retrieval | Reasoning | |||||
| Model | From Context | From Memory | Deductive | Analogical | Mathematical | Causal | |||
| Direct Prompt (ours) | |||||||||
| Llama-3.1-405B-Inst | 0.159 0.008 | 4.469 0.192 | 0.140 0.013 | 0.109 0.002 | 0.191 0.006 | 0.133 0.001 | 0.167 0.008 | 0.316 0.028 | 0.376 0.039 |
| Llama-3-70B-Inst | 0.518 0.030 | 10.406 0.190 | 0.504 0.038 | 0.371 0.071 | 0.523 0.048 | 0.461 0.048 | 0.694 0.117 | 0.573 0.044 | 0.643 0.049 |
| Llama-3-8B-Inst | 1.647 0.069 | 15.130 0.171 | 1.604 0.131 | 0.199 0.010 | 1.568 0.067 | 2.133 0.082 | 1.555 0.008 | 1.589 0.177 | 1.840 0.238 |
| Mixtral-8x7B-Inst | 1.061 0.058 | 13.381 0.230 | 0.857 0.077 | 0.296 0.049 | 1.077 0.078 | 1.352 0.117 | 1.145 0.144 | 1.000 0.086 | 1.096 0.106 |
| GPT-4o | 0.276 0.010 | 4.368 0.149 | 0.180 0.004 | 0.087 0.003 | 0.519 0.029 | 0.113 0.006 | 0.447 0.029 | 0.590 0.033 | 0.769 0.046 |
| GPT-4o-mini | 0.353 0.022 | 8.930 0.177 | 0.296 0.043 | 0.419 0.014 | 0.471 0.012 | 0.218 0.005 | 1.024 0.033 | 0.475 0.080 | 0.578 0.112 |
| LLMP | |||||||||
| Llama-3-70B-Inst | 0.550 0.013 | 8.038 0.205 | 0.645 0.018 | 0.284 0.015 | 0.392 0.014 | 0.519 0.026 | 0.312 0.019 | 0.453 0.020 | 0.495 0.028 |
| Llama-3-70B | 0.237 0.006 | 6.560 0.254 | 0.310 0.011 | 0.126 0.009 | 0.217 0.007 | 0.134 0.003 | 0.241 0.019 | 0.294 0.008 | 0.329 0.010 |
| Llama-3-8B-Inst | 0.484 0.010 | 9.457 0.166 | 0.345 0.002 | 0.138 0.004 | 0.910 0.030 | 0.242 0.008 | 1.278 0.069 | 0.617 0.022 | 0.787 0.030 |
| Llama-3-8B | 0.313 0.023 | 9.499 0.323 | 0.404 0.043 | 0.124 0.003 | 0.280 0.026 | 0.179 0.014 | 0.267 0.015 | 0.530 0.084 | 0.661 0.117 |
| Mixtral-8x7B-Inst | 0.264 0.004 | 8.496 0.256 | 0.344 0.004 | 0.127 0.003 | 0.224 0.005 | 0.179 0.010 | 0.173 0.009 | 0.348 0.005 | 0.405 0.007 |
| Mixtral-8x7B | 0.262 0.008 | 8.619 0.208 | 0.348 0.012 | 0.146 0.022 | 0.230 0.016 | 0.153 0.002 | 0.230 0.041 | 0.354 0.007 | 0.414 0.009 |
| Multimodal Models | |||||||||
| UniTime | 0.371 0.002 | 13.495 0.091 | 0.271 0.003 | 0.179 0.001 | 0.318 0.001 | 0.510 0.003 | 0.333 0.001 | 0.332 0.001 | 0.384 0.001 |
| Time-LLM (ETTh1) | 0.476 0.001 | 16.662 0.075 | 0.448 0.002 | 0.192 0.000 | 0.373 0.000 | 0.538 0.003 | 0.397 0.001 | 0.382 0.001 | 0.440 0.001 |
| TS Foundation Models* | |||||||||
| Lag-Llama | 0.329 0.004 | 13.157 0.235 | 0.355 0.007 | 0.181 0.003 | 0.324 0.003 | 0.272 0.006 | 0.342 0.006 | 0.386 0.009 | 0.449 0.012 |
| Chronos | 0.326 0.002 | 11.962 0.139 | 0.385 0.002 | 0.138 0.002 | 0.288 0.002 | 0.249 0.002 | 0.295 0.003 | 0.362 0.003 | 0.417 0.004 |
| TimeGEN | 0.354 0.000 | 14.345 0.090 | 0.402 0.000 | 0.176 0.000 | 0.308 0.000 | 0.279 0.000 | 0.324 0.000 | 0.377 0.000 | 0.431 0.000 |
| Moirai | 0.520 0.006 | 12.458 0.266 | 0.414 0.004 | 0.155 0.004 | 0.260 0.003 | 0.751 0.015 | 0.276 0.008 | 0.337 0.007 | 0.397 0.010 |
| Statistical Models* | |||||||||
| ARIMA | 0.480 0.006 | 12.320 0.180 | 0.399 0.006 | 0.160 0.002 | 0.517 0.012 | 0.522 0.013 | 0.706 0.026 | 0.354 0.007 | 0.403 0.010 |
| ETS | 0.522 0.009 | 14.310 0.203 | 0.407 0.009 | 0.228 0.010 | 0.682 0.018 | 0.571 0.019 | 0.855 0.035 | 0.453 0.012 | 0.479 0.015 |
| Exp-Smoothing | 0.603 0.013 | 14.936 0.132 | 0.571 0.021 | 0.334 0.013 | 0.743 0.018 | 0.557 0.019 | 0.899 0.035 | 0.673 0.038 | 0.782 0.053 |