notesum.ai
Published at November 22Cross-Modal Pre-Aligned Method with Global and Local Information for Remote-Sensing Image and Text Retrieval
cs.CV
cs.IR
Released Date: November 22, 2024
Authors: Zengbao Sun, Ming Zhao, Gaorui Liu, André Kaup

| RSICD Dateset | RSITMD Dataset | |||||||||||||
| Approach | Text Retrieval | Image Retrieval | Text Retrieval | Image Retrieval | ||||||||||
| R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | mR | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | mR | |
| AMFMN-soft | 5.05 | 14.53 | 21.57 | 5.05 | 19.74 | 31.04 | 16.02 | 11.06 | 25.88 | 39.82 | 9.82 | 33.84 | 51.90 | 28.74 |
| AMFMN-fusion | 5.39 | 15.08 | 23.40 | 4.90 | 18.28 | 31.44 | 16.42 | 11.06 | 29.20 | 38.72 | 9.96 | 34.04 | 52.96 | 29.32 |
| AMFMN-sim | 5.21 | 14.72 | 21.57 | 4.08 | 17.00 | 30.60 | 15.53 | 10.63 | 24.78 | 41.81 | 11.51 | 34.69 | 54.87 | 29.72 |
| LW-MCR-b | 4.57 | 13.71 | 20.11 | 4.02 | 16.47 | 28.23 | 14.52 | 9.07 | 22.79 | 38.05 | 6.11 | 27.74 | 49.56 | 25.55 |
| LW-MCR-d | 3.29 | 12.52 | 19.93 | 4.66 | 17.51 | 30.02 | 14.66 | 10.18 | 28.98 | 39.82 | 7.79 | 30.18 | 49.78 | 27.79 |
| GaLR | 6.59 | 19.85 | 31.04 | 4.69 | 18.48 | 32.13 | 18.96 | 14.82 | 31.64 | 42.48 | 11.15 | 36.68 | 51.68 | 31.41 |
| CLIP | 8.02 | 22.76 | 35.07 | 5.62 | 21.13 | 35.31 | 21.32 | 14.77 | 34.85 | 46.38 | 10.23 | 34.02 | 47.79 | 31.34 |
| IEFT | 8.43 | 27.15 | 40.74 | 7.65 | 27.35 | 42.19 | 25.86 | 15.34 | 36.67 | 50.34 | 11.02 | 37.21 | 57.31 | 34.65 |
| HVSA | 7.47 | 20.62 | 32.11 | 5.51 | 21.13 | 34.13 | 20.16 | 13.20 | 32.08 | 45.58 | 11.43 | 39.20 | 57.45 | 33.16 |
| KAMCL | 12.08 | 27.26 | 38.70 | 8.65 | 27.43 | 42.51 | 26.10 | 16.51 | 36.28 | 49.12 | 13.50 | 42.15 | 59.32 | 36.14 |
| CMPAGL w/o SMR | 11.71 | 27.08 | 41.17 | 8.86 | 29.03 | 44.72 | 27.10 | 18.81 | 36.72 | 51.11 | 14.48 | 41.65 | 60.35 | 37.18 |
| CMPAGL with SMR | 12.71 | 29.55 | 42.54 | 9.08 | 31.11 | 46.81 | 28.63 | 19.13 | 42.36 | 54.74 | 15.67 | 44.32 | 60.51 | 39.46 |
| UCM-Captions Dataset | Sydney-Captions Dataset | |||||||||||||
| Approach | Text Retrieval | Image Retrieval | Text Retrieval | Image Retrieval | ||||||||||
| R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | mR | R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | mR | |
| AMFMN-soft | 12.86 | 51.90 | 66.67 | 14.19 | 51.71 | 78.48 | 45.97 | 20.69 | 51.72 | 74.14 | 15.17 | 58.62 | 80.00 | 50.06 |
| AMFMN-fusion | 16.67 | 45.71 | 68.57 | 12.86 | 53.24 | 79.43 | 46.08 | 24.14 | 51.72 | 75.86 | 14.83 | 56.55 | 77.89 | 50.17 |
| AMFMN-sim | 14.76 | 49.52 | 68.10 | 13.43 | 51.81 | 76.48 | 45.68 | 29.31 | 58.62 | 67.24 | 13.35 | 60.00 | 81.72 | 51.72 |
| LW-MCR-b | 12.38 | 43.81 | 59.52 | 12.00 | 46.38 | 72.48 | 41.10 | 17.24 | 48.28 | 72.41 | 14.13 | 56.90 | 77.24 | 47.70 |
| LW-MCR-d | 15.24 | 51.90 | 62.86 | 11.90 | 50.95 | 75.24 | 44.68 | 18.97 | 58.63 | 75.86 | 13.45 | 57.59 | 78.97 | 50.57 |
| GaLR | 12.87 | 43.65 | 60.55 | 11.69 | 47.84 | 73.55 | 41.69 | 16.44 | 52.43 | 71.32 | 14.79 | 57.16 | 77.65 | 48.30 |
| CLIP | 13.57 | 46.35 | 62.43 | 10.08 | 46.34 | 73.66 | 42.07 | 24.35 | 54.37 | 73.21 | 17.57 | 55.72 | 74.39 | 49.94 |
| IEFT | 13.26 | 49.43 | 67.89 | 13.43 | 52.56 | 82.79 | 46.56 | 26.78 | 56.25 | 76.57 | 14.89 | 57.19 | 80.00 | 51.94 |
| HVSA | 12.85 | 46.03 | 68.25 | 11.90 | 47.14 | 72.48 | 43.11 | 12.64 | 40.63 | 66.32 | 13.79 | 48.39 | 70.00 | 41.96 |
| KAMCL | 13.17 | 47.25 | 68.18 | 12.35 | 49.25 | 76.33 | 44.42 | 14.32 | 44.88 | 68.57 | 15.27 | 55.18 | 73.86 | 45.34 |
| CMPAGL w/o SMR | 11.90 | 46.67 | 71.43 | 12.86 | 56.95 | 92.47 | 48.71 | 25.86 | 47.82 | 68.51 | 20.34 | 60.34 | 79.31 | 50.36 |
| CMPAGL with SMR | 12.86 | 47.62 | 72.86 | 13.52 | 58.95 | 93.05 | 49.81 | 27.32 | 49.34 | 73.32 | 21.03 | 61.04 | 81.05 | 52.22 |