notesum.ai
Published at November 29SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks
cs.CV
cs.LG
Released Date: November 29, 2024
Authors: Kim-Celine Kahl1, Selen Erkan1, Jeremias Traub1, Carsten T. Lüth, Klaus Maier-Hein2, Lena Maier-Hein3, Paul F. Jaeger1
Aff.: 1German Cancer Research Center (DKFZ) Heidelberg, Interactive Machine Learning Group, Germany; 2German Cancer Research Center (DKFZ) Heidelberg, Division of Medical Image Computing, Germany; 3German Cancer Research Center (DKFZ) Heidelberg, Division of Intelligent Medical Systems (IMSY), Germany

| # Tokens | Learning Rate | Closed-Ended (Mistral Accuracy) | Open-Ended (Mistral Score) |
|---|---|---|---|
| 40 | 3e-2 | 0.85 +/- 0.0 | 2.96 +/- 0.06 |
| 40 | 3e-1 | 0.85 +/- 0.01 | 2.95 +/- 0.05 |
| 60 | 3e-2 | 0.85 +/- 0.01 | 2.98 +/- 0.04 |
| 60 | 3e-1 | 0.85 +/- 0.0 | 2.97 +/- 0.04 |
| 80 | 3e-2 | 0.81 +/- 0.04 | 2.96 +/- 0.04 |
| 80 | 3e-1 | 0.84 +/- 0.01 | 2.99 +/- 0.02 |
| 100 | 3e-2 | 0.83 +/- 0.02 | 2.99 +/- 0.04 |
| 100 | 3e-1 | 0.85 +/- 0.0 | 3.0 +/- 0.03 |