notesum.ai
Published at December 3VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
cs.CV
cs.AI
Released Date: December 3, 2024
Authors: Kangsan Kim1, Geon Park1, Youngwan Lee1, Woongyeong Yeo1, Sung Ju Hwang1
Aff.: 1KAIST

| Multiple Choice QA | Open-ended QA | Video Classification | Video Captioning | ||||||||||||
| Animal Kingdom | Sports- QA | Pit- VQA | UCF- Crime | Drive &Act | CapERA | ||||||||||
| BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGE-L | ||||||||||
| GPT-4o [46] | - | 0 | 58.2 | - | 6.9 | 58.0 | - | 0.143 | 0.065 | 0.037 | 0.023 | 0.142 | 0.173 | ||
| Gemini-1.5 Pro [45] | - | 0 | 72.9 | - | 14.7 | 55.1 | - | 0.126 | 0.057 | 0.031 | 0.019 | 0.134 | 0.176 | ||
| Otter-7B [27] | 1 | 8 | 19.4 | - | 21.8 | 6.8 | - | 0.241 | 0.135 | 0.088 | 0.059 | 0.169 | 0.167 | ||
| LLaVA-Video [57] | 72B | Zero-shot | - | 0 | 69.7 | 25.7 | 5.7 | 35.6 | 14.6 | 0.133 | 0.060 | 0.034 | 0.020 | 0.129 | 0.170 |
| 7B | LoRA FT | - | 0 | 70.2 | - | 40.5 | 51.9 | - | 0.528 | 0.393 | 0.302 | 0.227 | 0.271 | 0.181 | |
| Zero-shot | - | 0 | 68.0 | 25.5 | 6.7 | 39.3 | 20.2 | 0.162 | 0.077 | 0.045 | 0.027 | 0.149 | 0.181 | ||
| MMICES [8] | 1 | 2 | 69.3 | 43.0 | 46.4 | 50.7 | 51.3 | 0.462 | 0.312 | 0.224 | 0.160 | 0.245 | 0.178 | ||
| SimRankOnce | 1 | 2 | 69.3 | 41.8 | 54.0 | 50.7 | 52.0 | 0.462 | 0.312 | 0.224 | 0.160 | 0.245 | 0.178 | ||
| RandExVote | 4 | 8 | 69.6 | 21.5 | 11.5 | 36.6 | 19.9 | 0.418 | 0.256 | 0.170 | 0.116 | 0.189 | 0.153 | ||
| SimRankVote | 4 | 8 | 70.9 | 36.3 | 57.6 | 50.6 | 50.6 | 0.464 | 0.314 | 0.228 | 0.165 | 0.242 | 0.175 | ||
| VideoICL (Ours) | 4 | 8 | 72.3 | 47.6 | 61.3 | 53.3 | 53.4 | 0.465 | 0.320 | 0.235 | 0.170 | 0.252 | 0.178 | ||
| +4.3 | +22.1 | +54.6 | +14.0 | +33.2 | +0.302 | +0.242 | +0.190 | +0.143 | +0.104 | -0.003 | |||||
| Qwen2-VL [47] | 7B | Zero-shot | - | 0 | 58.6 | 26.8 | 5.8 | 36.1 | 10.6 | 0.286 | 0.159 | 0.101 | 0.066 | 0.149 | 0.138 |
| SimRankOnce | 1 | 2 | 63.8 | 43.2 | 55.3 | 46.3 | 45.4 | 0.457 | 0.310 | 0.223 | 0.158 | 0.249 | 0.189 | ||
| RandExVote | 4 | 8 | 62.3 | 21.0 | 14.0 | 36.6 | 14.3 | 0.397 | 0.240 | 0.157 | 0.104 | 0.188 | 0.170 | ||
| SimRankVote | 4 | 8 | 64.0 | 50.9 | 59.4 | 46.7 | 45.8 | 0.448 | 0.300 | 0.213 | 0.151 | 0.239 | 0.187 | ||
| VideoICL (Ours) | 4 | 8 | 66.3 | 51.5 | 59.6 | 48.7 | 49.3 | 0.471 | 0.329 | 0.244 | 0.176 | 0.265 | 0.189 | ||
| +7.7 | +24.7 | +53.8 | +12.6 | +38.7 | +0.185 | +0.170 | +0.143 | +0.110 | +0.116 | +0.051 | |||||
| Oryx-1.5 | 7B | Zero-shot | - | 0 | 58.6 | 28.3 | 3.8 | 11.9 | 10.7 | 0.242 | 0.126 | 0.077 | 0.049 | 0.140 | 0.151 |
| [34] | VideoICL (Ours) | 4 | 8 | 58.5 | 52.0 | 58.4 | 44.0 | 57.3 | 0.327 | 0.195 | 0.128 | 0.086 | 0.188 | 0.179 | |
| -0.1 | +23.7 | +54.6 | +32.1 | +46.6 | +0.085 | +0.069 | +0.052 | +0.038 | +0.047 | +0.028 | |||||