notesum.ai
Published at November 6MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
cs.CL
cs.AI
cs.LG
Released Date: November 6, 2024
Authors: Laura Cabello1, Carmen Martin-Turrero1, Uchenna Akujuobi1, Anders Søgaard2, Carlos Bobed3
Aff.: 1Sony AI, Barcelona, Spain; 2University of Copenhagen, Denmark; 3University of Zaragoza, Spain

| MedQA | PubMedQA | MedMCQA | MMLU-Medical | Avg | MMLU-Medical | |||||||
| Human (pass) | 60.0 | 60.0 | Clinical K. | Genetics | Anatomy | P. Medicine | C. Biology | C. Medicine | ||||
| Human (expert) | 87.0 | 78.0 | 90.0 | |||||||||
| Models based on Llama | Models based on Llama | |||||||||||
| MedAlpaca (7B)† | 40.1±0.4 | 73.6±0.3 | 37.0±0.3 | 55.1±1.1 | 51.4 | 53.1±0.9 | 58.0±2.2 | 54.1±1.6 | 58.8±0.3 | 58.1±1.3 | 48.6±0.5 | |
| MEDITRON-7B∗‡ | 52.0±- | 74.4±- | 59.2±- | 54.2±- | 60.0 | 57.2±- | 64.6±- | 49.3±- | 55.4±- | 53.8±- | 44.8±- | |
| Meerkat-8B∗‡ | 74.2±- | - | 62.7±- | 75.2±- | 70.7 | 74.3±- | 76.7±- | 74.8±- | 75.3±- | 76.1±- | 74.3±- | |
| MEG-Llama (8B) | 66.0±0.2 | 78.0±0.3 | 60.6±0.3 | 74.9±0.7 | 69.9 | 72.3±0.5 | 83.0±1.5 | 64.5±0.7 | 79.4±0.3 | 80.6±0.4 | 69.4±0.9 | |
| Models based on Mistral-Instruct 7B | Models based on Mistral-Instruct 7B | |||||||||||
| Mistral-Instruct-v0.1† | 42.0±0.2 | 73.8 | 46.1±0.1 | 59.1±1.0 | 55.3 | 62.9±0.2 | 57.0±0.8 | 55.6±1.0 | 59.4±0.6 | 62.5±1.0 | 57.2±2.1 | |
| BioMistral† | 50.6±0.3 | 77.5±0.1 | 48.1±0.2 | 59.1±1.3 | 58.8 | 59.9±1.2 | 64.0±1.6 | 56.5±1.8 | 60.4±0.5 | 59.0±1.5 | 54.7±1.0 | |
| BioMistral DARE† | 51.1±0.3 | 77.7±0.1 | 48.7±0.1 | 61.9±1.2 | 59.9 | 62.3±1.3 | 67.0±1.6 | 55.8±0.9 | 61.4±0.3 | 66.9±2.3 | 58.0±0.5 | |
| Meerkat-7B ‡ | 70.3±- | - | 60.6±- | 70.5±- | 67.1 | 71.6±- | 74.8±- | 63.2±- | 77.3±- | 70.8±- | 65.2±- | |
| MEG-Mistral1 | 54.6±0.2 | 74.6±0.6 | 56.4±0.4 | 60.3±0.9 | 61.5 | 58.1±0.8 | 68.7±0.2 | 54.4±0.5 | 62.9±0.9 | 61.1±2.2 | 56.6±1.0 | |
| MEG-Mistral3 | 60.8±0.2 | 74.4±0.5 | 58.4±0.6 | 68.2±0.4 | 65.5 | 64.9±0.2 | 69.6±0.8 | 63.0±1.0 | 72.8±0.4 | 73.6±0.0 | 65.2±0.2 | |