notesum.ai
Published at December 3Vision Transformers for Weakly-Supervised Microorganism Enumeration
cs.CV
Released Date: December 3, 2024
Authors: Javier Ureña Santiago1, Thomas Ströhle2, Antonio Rodríguez-Sánchez1, Ruth Breu2
Aff.: 1Intelligent and Interactive Systems, University of Innsbruck, Austria; 2Quality Engineering, University of Innsbruck, Austria

| Model name | Variant | Depth | Heads | Dimension | MLP Dim. | Number of parameters () |
| CNN | Base/Medium/Deep | 1 / 2 / 3 | - | - | 16 / 64 / 256 | 0.59 / 0.61 / 0.96 |
| ResNet | 50/101 | 16 / 33 | - | - | 2048 | 23.53 / 42.54 |
| ViT | Vanilla | 12 | 12 | 768 | 3072 | 87.50 |
| XCiT | S24 | 24 | 8 | 384 | 1536 | 49.82 |
| CrossViT | Ti | 4 | 3 | 96 / 192 | 384 / 768 | 3.07 |
| Parallel ViT | Ti | 12 | 3 | 192 | 192 | 5.50 |
| Deep ViT | S | 16 | 12 | 396 | 1188 | 34.91 |
| TransCrowd-G | Vanilla | 12 | 12 | 768 | 3072 |