notesum.ai
Published at November 29Can Large Language Models Reason about the Region Connection Calculus?
cs.CL
Released Date: November 29, 2024
Authors: Anthony G Cohn1, Robert E Blackwell2
Aff.: 1School of Computing, University of Leeds, UK; 2The Alan Turing Institute, UK

| Model | 1epon | 1anon | 2epon | 2anon | 3epon | 3anon | Overall |
|---|---|---|---|---|---|---|---|
| claude-3-5-sonnet-20240620 | 0.69 0.008 | 0.58 0.012 | 0.56 0.011 | 0.54 0.008 | 0.61 0.023 | 0.61 0.026 | 0.60 0.016 |
| azure-gpt-4-turbo-2024-04-09 | 0.50 0.014 | 0.45 0.013 | 0.50 0.014 | 0.49 0.014 | 0.67 0.029 | 0.57 0.036 | 0.53 0.022 |
| gemini-15-pro | 0.51 0.013 | 0.49 0.011 | 0.51 0.010 | 0.46 0.007 | 0.52 0.021 | 0.42 0.021 | 0.49 0.015 |
| azure-gpt-4o-2024-05-13 | 0.47 0.013 | 0.43 0.012 | 0.50 0.012 | 0.50 0.014 | 0.57 0.039 | 0.41 0.033 | 0.48 0.024 |
| azureai-llama-3-70b-instruct | 0.43 0.006 | 0.40 0.010 | 0.40 0.004 | 0.40 0.005 | 0.43 0.018 | 0.18 0.021 | 0.37 0.013 |
| azure-gpt-35-turbo-0125 | 0.33 0.009 | 0.25 0.011 | 0.44 0.015 | 0.36 0.011 | 0.39 0.034 | 0.30 0.030 | 0.35 0.021 |
| Overall | 0.49 0.011 | 0.43 0.012 | 0.49 0.012 | 0.46 0.011 | 0.53 0.028 | 0.42 0.028 | 0.47 0.057 |
| Guess rate | 0.31 0.012 | 0.31 0.012 | 0.13 0.017 | 0.13 0.017 | 0.26 0.025 | 0.26 0.025 | 0.23 0.033 |