notesum.ai
Published at November 3RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering
cs.CV
cs.AI
Released Date: November 3, 2024
Authors: Hui Lin, Danfeng Hong, Shuhang Ge, Chuyao Luo, Kai Jiang, Hao Jin, Congcong Wen

| Method | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGE_L | CIDEr |
|---|---|---|---|---|---|---|---|
| BLIP2-13B [11] | 54.51 | 42.42 | 34.64 | 28.17 | 27.54 | 24.38 | 105.51 |
| MiniGPT4-13B [12] | 68.49 | 59.31 | 43.22 | 39.78 | 31.70 | 30.95 | 120.33 |
| InstructBLIP-13B [13] | 63.71 | 50.76 | 49.35 | 40.50 | 30.58 | 27.15 | 121.91 |
| RSGPT-13B [15] | 77.05 | 62.18 | 48.25 | 40.34 | 37.41 | 33.26 | 149.32 |
| Qwen2-VL-7B [25] | 78.66 | 60.98 | 47.07 | 39.24 | 40.05 | 32.63 | 124.36 |
| LLaVA-NeXT-7B [26] | 81.72 | 63.99 | 49.31 | 40.38 | 39.59 | 34.62 | 128.27 |
| RS-MoE-1B | 57.36 | 42.25 | 22.14 | 20.54 | 32.36 | 25.98 | 109.37 |
| RS-MoE-3B | 65.11 | 52.44 | 28.78 | 26.86 | 40.04 | 27.62 | 120.20 |
| RS-MoE-7B | 82.13 | 65.44 | 51.93 | 42.55 | 40.28 | 35.72 | 158.36 |