notesum.ai
Published at November 19Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning
cs.CV
cs.AI
Released Date: November 19, 2024
Authors: Pengkun Jiao, Bin Zhu, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang

| Method | Ingredient Recognition | Recipe Generation | Nutrition Estimation (pMAE ) | |||||||
| IoU | F1 | SacreBLEU | Rouge-L | mass | cal | fat | protein | carb | avg | |
| vanilla LoRA [8] | 23.2 | 34.1 | 12.4 | 40.1 | 46.2 | 45.5 | 57.1 | 53.4 | 48.7 | 50.2 |
| MoE-LoRA (top-2) [27] | 22.9 | 33.8 | 12.7 | 40.2 | 45.56 | 45.8 | 56.9 | 54.4 | 48.0 | 50.1 |
| MoE-LoRA (softmax) [27] | 22.7 | 33.5 | 12.5 | 40.0 | 45.3 | 45.5 | 58.1 | 53.7 | 47.5 | 50.0 |
| RoDE [10] | 23.6 | 34.6 | 13.8 | 41.4 | 45.8 | 47.6 | 58.5 | 54.4 | 50.4 | 51.3 |
| Dual-LoRA | 24.2 | 35.2 | 14.8 | 42.1 | 46.1 | 46.2 | 56.8 | 52.2 | 48.7 | 49.9 |
| Dual-LoRA + VCE | 24.5 | 35.5 | 14.7 | 42.2 | 44.9 | 45.0 | 56.8 | 51.2 | 47.5 | 49.1 |