notesum.ai
Published at October 22Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
cs.LG
cs.AI
Released Date: October 22, 2024
Authors: Jerry Huang1, Prasanna Parthasarathi2, Mehdi Rezagholizadeh2, Boxing Chen2, Sarath Chandar3
Aff.: 1Mila - Quebec AI Institute, Université de Montréal; 2Huawei Noah's Ark Lab; 3Mila - Quebec AI Institute, Polytechnique Montréal

| Model Name | Hallu. Detection | Instr. Following | Closed-Book QA | Reading Comp. | Sum. | Fact-Checking | Faithfulness | Factuality |
| Attention-Only Models | ||||||||
| Gemma-2B | 49.65 ( 11.51) | 30.25 ( 13.84) | 38.20 ( -6.92) | 38.96 ( -13.96) | 24.93 ( -4.53) | 54.14 ( 1.60) | 35.85 ( 1.92) | 32.56 ( -4.56) |
| Gemma-7B | 53.81 ( -5.67) | 31.94 ( 6.79) | 27.43 ( -4.13) | 31.35 ( -5.83) | 20.63 ( 1.04) | 43.28 ( -5.22) | 36.48 ( 2.71) | 44.50 ( -6.04) |
| Gemma2-2B | 58.33 ( 0.64) | 26.73 ( 10.90) | 29.59 ( 4.83) | 32.95 ( -5.26) | 16.04 ( -0.47) | 39.57 ( 18.96) | 35.61 ( 0.65) | 34.96 ( -0.98) |
| Gemma2-9B | 65.64 ( 6.31) | 23.76 ( 30.07) | 39.36 ( 3.13) | 39.74 ( -0.70) | 21.38 ( -3.35) | 62.06 ( 6.47) | 40.56 ( 7.55) | 46.85 ( 3.13) |
| Gemma2-27B | 62.03 ( 12.43) | 26.62 ( 33.58) | 47.19 ( 2.86) | 43.04 ( 5.02) | 28.92 ( -0.74) | 68.28 ( 1.00) | 41.54 ( 13.02) | 53.92 ( 2.36) |
| LLaMA2-7B | 53.63 ( 4.30) | 28.94 ( 8.99) | 37.64 ( -1.89) | 27.58 ( 3.74) | 25.19 ( 0.69) | 51.38 ( 5.88) | 35.14 ( 4.01) | 42.51 ( -0.12) |
| LLaMA2-13B | 67.80 ( -3.68) | 26.57 ( 9.72) | 39.65 ( -0.65) | 32.17 ( -3.64) | 27.33 ( -0.23) | 62.35 ( -1.64) | 40.83 ( 0.88) | 46.27 ( -0.49) |
| LLaMA2-70B | 62.34 ( 10.78) | 30.24 ( 17.11) | 47.85 ( -2.53) | 39.48 ( -8.21) | 28.00 ( -0.55) | 66.63 ( -1.45) | 41.74 ( 5.82) | 53.69 ( -2.34) |
| LLaMA3-8B | 60.76 ( 10.12) | 22.06 ( 26.76) | 41.55 ( 1.30) | 33.52 ( 3.62) | 26.62 ( -1.60) | 60.84 ( 4.82) | 37.60 ( 10.09) | 48.14 ( 1.60) |
| LLaMA3-70B | 71.78 ( 8.69) | 21.59 ( 25.60) | 48.07 ( 3.43) | 46.38 ( -7.73) | 28.91 ( -1.68) | 69.57 ( 0.99) | 45.92 ( 6.23) | 54.80 ( 2.81) |
| Mistral-7B | 60.48 ( 3.36) | 28.39 ( 16.85) | 41.24 ( 5.21) | 32.03 ( 1.92) | 26.77 ( -0.77) | 58.59 ( 6.78) | 38.47 ( 4.67) | 47.42 ( 4.17) |
| Mixtral-8x7B | 73.51 ( -0.15) | 26.84 ( 17.60) | 48.99 ( 4.33) | 40.66 ( -3.90) | 27.91 ( -1.07) | 68.35 ( 0.49) | 46.16 ( 3.89) | 55.15 ( 3.50) |
| Falcon-7B | 52.40 ( -2.90) | 25.54 ( 12.62) | 33.03 ( -4.42) | 29.18 ( -3.64) | 20.70 ( -1.00) | 46.91 ( -7.71) | 34.91 ( -0.71) | 36.92 ( -3.75) |
| Recurrent and Hybrid Models | ||||||||
| RecurrentGemma-2B | 52.88 ( 2.33) | 33.43 ( 3.12) | 27.36 ( -1.85) | 30.66 ( -1.77) | 18.41 ( 2.86) | 42.97 ( 3.63) | 36.55 ( -0.14) | 31.46 ( 1.39) |
| RecurrentGemma-9B | 55.75 ( -1.67) | 31.67 ( 12.96) | 36.79 ( 2.03) | 36.57 ( -6.66) | 22.99 ( 1.84) | 51.25 ( 13.36) | 37.61 ( -0.17) | 42.96 ( 3.28) |
| Jamba | 57.66 ( -2.90) | 43.36 ( -5.76) | 39.50 ( -0.98) | 33.00 ( -5.88) | 23.72 ( -15.48) | 59.88 ( -2.60) | 39.80 ( -6.91) | 45.95 ( -0.91) |
| FalconMamba-7B | 55.80 ( 0.73) | 42.97 ( -1.19) | 39.94 ( -0.43) | 23.76 ( -0.85) | 23.89 ( -0.09) | 61.85 ( -1.80) | 35.92 ( -0.26) | 47.02 ( -0.38) |