notesum.ai
Published at November 6RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models
cs.CV
cs.AI
Released Date: November 6, 2024
Authors: Maya Varma1, Jean-Benoit Delbrouck2, Zhihong Chen1, Akshay Chaudhari1, Curtis Langlotz1
Aff.: 1Stanford University; 2Stanford University; Hugging Face

| Method | Stage 1 Discovery Precision@10 | Stage 1 Discovery Precision@10 | ||||||
|---|---|---|---|---|---|---|---|---|
| Img. Overall | Img. WG | Reg. Overall | Reg. WG | Img. Overall | Img. WG | Reg. Overall | Reg. WG | |
| Standard FT | 64.0 | 31.4 | 72.0 | 46.9 | 64.6 | 31.0 | 72.9 | 47.4 |
| Upsampled FT | 66.6 | 37.8 | 74.3 | 52.2 | 66.7 | 37.7 | 74.7 | 52.8 |
| VL-ERM | 68.8 | 32.2 | 75.6 | 50.3 | 68.7 | 30.9 | 75.9 | 50.6 |
| VL-GDRO | 69.1 | 33.7 | 75.6 | 50.4 | 68.8 | 31.1 | 76.0 | 51.0 |
| Spurious-Aware | 69.8 | 33.6 | 76.5 | 50.6 | 69.2 | 30.7 | 76.8 | 50.5 |
| RaVL (Ours) | 69.8 | 39.1 | 78.9 | 57.8 | 70.2 | 40.8 | 79.5 | 58.5 |