notesum.ai
Published at December 9SafeWorld: Geo-Diverse Safety Alignment
cs.CL
cs.AI
Released Date: December 9, 2024
Authors: Da Yin1, Haoyi Qiu1, Kung-Hsiang Huang2, Kai-Wei Chang1, Nanyun Peng1
Aff.: 1UCLA; 2Salesforce AI Research

| Models | Avg Cover.() | Avg Faith.() | Avg Fact.() | Resp. Type Match.() |
|---|---|---|---|---|
| Proprietary LLMs | ||||
| Command-R | 0.132 | 0.049 | 0.466 | 0.293 |
| Command-R+ | 0.143 | 0.067 | 0.473 | 0.288 |
| GPT-4-turbo | 0.380 | 0.162 | 0.617 | 0.285 |
| GPT-4o | 0.301 | 0.141 | 0.560 | 0.268 |
| Proprietary LLM Prompting w/ Various Guidances | ||||
| GPT-4-turbo w/ Explicit Guidance in System Prompt | 0.420 | 0.163 | 0.648 | 0.277 |
| GPT-4-turbo w/ Explicit Guidance in User Prompt | 0.321 | 0.130 | 0.615 | 0.282 |
| GPT-4-turbo w/ Retrieved Guidelines | 0.407 | 0.168 | 0.653 | 0.288 |
| GPT-4-turbo w/ Ground-Truth Guidelines | 0.507 | 0.347 | 0.568 | 0.332 |
| SafeWorldLM-Series Open-Source LLMs | ||||
| SafeWorldLM w/o Neg. Category 1 | 0.366 | 0.096 | 0.676 | 0.392 |
| SafeWorldLM w/o Neg. Category 2 | 0.469 | 0.164 | 0.564 | 0.516 |
| SafeWorldLM (50% Data) | 0.472 | 0.124 | 0.654 | 0.457 |
| SafeWorldLM | 0.521 | 0.149 | 0.628 | 0.473 |