notesum.ai
Published at November 18Enhancing Vision-Language Model Safety through Progressive Concept-Bottleneck-Driven Alignment
cs.CV
cs.AI
Released Date: November 18, 2024
Authors: Zhendong Liu1, Yuanbi Nie2, Yingshui Tan3, Xiangyu Yue4, Qiushi Cui2, Chongjun Wang1, Xiaoyong Zhu3, Bo Zheng3
Aff.: 1Department of Computer Science and Technology, Nanjing University, Nanjing, Jiangsu Province, China; 2School of Electrical Engineering, Chongqing University, Chongqing, China; 3Alibaba Group, Hangzhou, Zhejiang Province, China; 4Department of Information Engineering, Multimedia Lab (MMLab), Chinese University of Hong Kong, Hong Kong, China

| Method | Faithfulness | Privacy | Safety | Fairness | Avg | ||||||
| Misleading | Order | Celebrity | Politics | Racial | Captcha | Jailbreak | Face | ||||
| Text | Visual | ✓-✗ | ✗-✓ | ||||||||
| Fuyu-8B | 2.57 | 3.17 | 5.17 | 4.28 | 4.02 | 2.42 | 3.11 | 7.46 | 1.36 | 7.21 | 4.08 |
| VisualGLM-6B | 6.28 | 2.42 | 2.06 | 1.84 | 4.54 | 3.14 | 4.39 | 8.58 | 3.91 | 7.31 | 4.45 |
| Qwen-VL-Chat-7B | 8.34 | 4.93 | 5.42 | 5.28 | 5.55 | 6.38 | 6.89 | 7.44 | 2.14 | 7.35 | 5.97 |
| LLaVA-v1.5-7B | 8.52 | 4.54 | 6.27 | 5.83 | 4.38 | 6.03 | 7.03 | 7.07 | 7.14 | 7.06 | 6.39 |
| + SFT | 8.57 | 3.97 | 5.31 | 5.37 | 4.75 | 5.51 | 6.67 | 7.98 | 4.86 | 7.17 | 6.02 |
| + RLHF | 8.39 | 3.93 | 5.52 | 4.50 | 3.63 | 5.41 | 6.56 | 5.61 | 3.54 | 6.59 | 5.37 |
| + ShareGPT4V | 8.53 | 4.81 | 5.33 | 5.88 | 4.88 | 6.86 | 7.23 | 6.71 | 7.31 | 7.17 | 6.47 |
| + VLGuard-FT | 8.59 | 7.77 | 7.78 | 7.52 | 7.97 | 6.40 | 6.71 | 7.98 | 9.75 | 8.28 | 7.87 |
| + VLGuard-LoRA | 8.54 | 7.82 | 8.05 | 8.25 | 7.63 | 7.20 | 7.16 | 8.34 | 9.50 | 8.37 | 8.09 |
| LLaVA-v1.5-13B | 8.65 | 5.27 | 6.33 | 5.97 | 4.84 | 6.13 | 7.49 | 7.13 | 6.54 | 7.14 | 6.55 |
| + SFT | 8.68 | 4.76 | 5.80 | 6.21 | 5.00 | 6.81 | 7.10 | 7.03 | 5.59 | 7.18 | 6.42 |
| + VLGuard-FT | 8.91 | 8.01 | 8.17 | 8.28 | 8.23 | 7.53 | 7.01 | 8.08 | 9.00 | 8.04 | 8.13 |
| + VLGuard-LoRA | 8.45 | 7.95 | 7.66 | 7.52 | 7.76 | 6.42 | 7.28 | 9.93 | 9.50 | 9.03 | 8.15 |
| InternLM-XComposer2 | 8.83 | 8.61 | 8.51 | 8.67 | 8.01 | 7.26 | 7.85 | 6.04 | 3.33 | 8.27 | 7.54 |
| Llama-3-vision-alpha | 7.50 | 6.23 | 6.31 | 6.75 | 7.11 | 7.06 | 7.57 | 6.91 | 7.75 | 6.48 | 6.97 |
| GPT-4V | 9.28 | 6.06 | 7.28 | 7.23 | 7.04 | 7.32 | 7.64 | 9.95 | 9.59 | 7.80 | 7.92 |
| 8.67 | 8.21 | 8.12 | 7.99 | 9.04 | 7.58 | 6.83 | 8.80 | 9.00 | 7.60 | 8.18 | |
| PSA-VLM-7B | |||||||||||
| 8.62 | 8.35 | 8.17 | 8.32 | 8.90 | 8.00 | 7.33 | 7.74 | 9.50 | 7.62 | 8.26 | |
| +LoRA | |||||||||||
| 8.92 | 7.92 | 7.81 | 7.45 | 8.04 | 8.29 | 8.29 | 9.34 | 9.25 | 8.67 | 8.40 | |
| PSA-VLM-13B | |||||||||||
| 8.81 | 7.97 | 7.99 | 8.03 | 7.87 | 8.36 | 8.43 | 9.29 | 9.25 | 8.58 | 8.46 | |
| +LoRA | |||||||||||