notesum.ai
Published at December 5Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts
cs.CV
cs.AI
Released Date: December 5, 2024
Authors: Chenyang Zhu1, Bin Xiao1, Lin Shi1, Shoukun Xu1, Xu Zheng2
Aff.: 1University; 2HKUST(GZ), Guangdong, China

| Method | Modal | Backbone | mIoU | |
|---|---|---|---|---|
| CMNeXt [13] | Frame | MiT-B0 | 43.37 | - |
| CWSAM [48] | Frame | ViT-B | 55.41 | 12.04 |
| SAM-LoRA | Frame | ViT-B | 65.91 | 22.54 |
| MLE-SAM | Frame | Hiera-B+ | 73.95 | 30.58 |
| CMNeXt [13] | Frame-Event | MiT-B0 | 43.39 | - |
| CWSAM [48] | Frame-Event | ViT-B | 41.77 | -1.62 |
| SAM-LoRA | Frame-Event | ViT-B | 67.96 | 24.57 |
| MLE-SAM | Frame-Event | Hiera-B+ | 74.73 | 31.34 |
| CMNeXt [13] | Frame-LiDAR | MiT-B0 | 47.03 | - |
| CWSAM [48] | Frame-LiDAR | ViT-B | 40.69 | -6.34 |
| SAM-LoRA | Frame-LiDAR | ViT-B | 70.34 | 23.31 |
| MLE-SAM | Frame-LiDAR | Hiera-B+ | 75.42 | 28.39 |
| CMNeXt [13] | Frame-E-LiDAR | MiT-B0 | 46.66 | - |
| CWSAM [48] | Frame-E-LiDAR | ViT-B | 49.98 | 3.32 |
| SAM-LoRA | Frame-E-LiDAR | ViT-B | 70.08 | 23.42 |
| MLE-SAM | Frame-E-LiDAR | Hiera-B+ | 74.8 | 28.14 |