notesum.ai
Published at November 4Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models
cs.CL
cs.AI
Released Date: November 4, 2024
Authors: Guangzhi Xiong1, Eric Xie1, Amir Hassan Shariatmadari1, Sikun Guo1, Stefan Bekiranov1, Aidong Zhang1
Aff.: 1University of Virginia

| LLM | Method | Knowledge | Greedy Search | Self Consistency | |||||
| Accuracy | F1 | Confidence | Accuracy | F1 | Confidence | ||||
| Llama-3.1-8B | Direct | No | 47.00 | 48.71 | 00.00 | 65.00 | 70.86 | 00.00 | |
| Llama-3.1-8B | CoT | No | 56.67 | 56.61 | 40.22 | 56.67 | 64.92 | 41.36 | |
| Llama-3.1-8B | RAG | Yes | 68.67 | 69.83 | 37.44 | 65.00 | 70.86 | 38.35 | |
| Llama-3.1-8B | KG-CoI | Yes | 70.33 | 70.42 | 43.46 | 66.67 | 72.63 | 40.46 | |
| Llama-3.1-70B | Direct | No | 72.00 | 71.94 | 00.00 | 71.67 | 72.00 | 00.00 | |
| Llama-3.1-70B | CoT | No | 73.67 | 73.77 | 35.23 | 71.33 | 71.58 | 34.97 | |
| Llama-3.1-70B | RAG | Yes | 73.33 | 73.50 | 35.02 | 72.33 | 73.12 | 34.58 | |
| Llama-3.1-70B | KG-CoI | Yes | 79.33 | 79.52 | 30.78 | 81.00 | 81.92 | 26.24 | |
| GPT-4o-mini | Direct | No | 70.00 | 69.45 | 00.00 | 69.33 | 69.71 | 00.00 | |
| GPT-4o-mini | CoT | No | 73.00 | 72.61 | 39.28 | 73.33 | 73.59 | 39.21 | |
| GPT-4o-mini | RAG | Yes | 76.67 | 76.55 | 40.61 | 76.67 | 76.70 | 40.04 | |
| GPT-4o-mini | KG-CoI | Yes | 82.67 | 82.56 | 43.87 | 84.00 | 84.27 | 44.24 | |
| GPT-4o | Direct | No | 73.33 | 73.40 | 00.00 | 74.00 | 74.37 | 00.00 | |
| GPT-4o | CoT | No | 74.33 | 74.26 | 34.41 | 75.67 | 75.68 | 34.93 | |
| GPT-4o | RAG | Yes | 75.67 | 75.97 | 37.74 | 74.33 | 74.74 | 36.21 | |
| GPT-4o | KG-CoI | Yes | 86.00 | 85.83 | 44.24 | 86.33 | 86.17 | 41.66 | |