notesum.ai

Published at December 9

SafeWorld: Geo-Diverse Safety Alignment

cs.CL

cs.AI

Released Date: December 9, 2024

Authors: Da Yin¹, Haoyi Qiu¹, Kung-Hsiang Huang², Kai-Wei Chang¹, Nanyun Peng¹

Aff.: ¹UCLA; ²Salesforce AI Research

Arxiv: http://arxiv.org/pdf/2412.06483v1

Refer to caption

Models	Avg Cover.( $\uparrow$ )	Avg Faith.( $\uparrow$ )	Avg Fact.( $\uparrow$ )	Resp. Type Match.( $\uparrow$ )
Proprietary LLMs
Command-R	0.132	0.049	0.466	0.293
Command-R+	0.143	0.067	0.473	0.288
GPT-4-turbo	0.380	0.162	0.617	0.285
GPT-4o	0.301	0.141	0.560	0.268
Proprietary LLM Prompting w/ Various Guidances
GPT-4-turbo w/ Explicit Guidance in System Prompt	0.420	0.163	0.648	0.277
GPT-4-turbo w/ Explicit Guidance in User Prompt	0.321	0.130	0.615	0.282
GPT-4-turbo w/ Retrieved Guidelines	0.407	0.168	0.653	0.288
GPT-4-turbo w/ Ground-Truth Guidelines	0.507	0.347	0.568	0.332
SafeWorldLM-Series Open-Source LLMs
SafeWorldLM w/o Neg. Category 1	0.366	0.096	0.676	0.392
SafeWorldLM w/o Neg. Category 2	0.469	0.164	0.564	0.516
SafeWorldLM (50% Data)	0.472	0.124	0.654	0.457
SafeWorldLM	0.521	0.149	0.628	0.473