notesum.ai
Published at November 27COREval: A Comprehensive and Objective Benchmark for Evaluating the Remote Sensing Capabilities of Large Vision-Language Models
cs.CV
Released Date: November 27, 2024
Authors: Xiao An1, Jiaxing Sun1, Zihan Gui1, Wei He1
Aff.: 1The State Key Lab. LIESMARS, Wuhan University

| Model | ILC | SII | CID | AttR | AssR | CSR | |
| General-domain Large Vision-Language Models | |||||||
| Qwen2-VL-7B[41] | 0.830 | 0.682 | 0.703 | 0.243 | 0.233 | 0.910 | |
| InternVL2-8B[4] | 0.806 | 0.625 | 0.666 | 0.480 | 0.278 | 0.822 | |
| LLaVA-1.6-7B[23] | 0.746 | 0.543 | 0.630 | 0.230 | 0.155 | 0.718 | |
| LLama3.2-11B[9] | 0.749 | 0.524 | 0.614 | 0.290 | 0.158 | 0.808 | |
| GLM-4V-9B[10] | 0.788 | 0.572 | 0.655 | 0.187 | 0.010 | 0.846 | |
| DeepSeek-VL-7B[26] | 0.796 | 0.638 | 0.628 | 0.270 | 0.060 | 0.860 | |
| MiniCPM-V-2.5[46] | 0.785 | 0.603 | 0.653 | 0.247 | 0.208 | 0.882 | |
| Phi3-Vision[1] | 0.750 | 0.547 | 0.584 | 0.280 | 0.075 | 0.714 | |
| Remote Sensing Large Vision-Language Models | |||||||
| GeoChat[15] | 0.726 | 0.508 | 0.251 | 0.327 | 0.138 | 0.696 | |
| LHRS-Bot[29] | 0.708 | 0.317 | 0.181 | 0.267 | 0.230 | 0.574 | |
| LHRS-Bot-nova[29] | 0.768 | 0.526 | 0.262 | 0.327 | 0.143 | 0.578 | |
| RN50 | 0.717 | 0.328 | 0.504 | 0.307 | 0.225 | 0.416 | |
| ViT-B | 0.753 | 0.348 | 0.338 | 0.287 | 0.185 | 0.500 | |
| RemoteCLIP[22] | ViT-L | 0.709 | 0.355 | 0.514 | 0.293 | 0.140 | 0.736 |
| ViT-B | 0.757 | 0.311 | 0.541 | 0.327 | 0.243 | 0.828 | |
| ViT-B_RET-2 | 0.772 | 0.255 | 0.447 | 0.337 | 0.188 | 0.766 | |
| ViT-L | 0.749 | 0.299 | 0.571 | 0.327 | 0.230 | 0.904 | |
| ViT-L-336 | 0.762 | 0.333 | 0.598 | 0.347 | 0.170 | 0.906 | |
| GeoRSCLIP[49] | ViT-H | 0.763 | 0.331 | 0.404 | 0.370 | 0.285 | 0.928 |