notesum.ai

Published at October 21

The effect of fine-tuning on language model toxicity

cs.AI

Released Date: October 21, 2024

Authors: Will Hawkins¹, Brent Mittelstadt¹, Chris Russell¹

Aff.: ¹Oxford Internet Institute, University of Oxford

Arxiv: https://arxiv.org/abs/2410.15821v1

Refer to caption

Family	Model	Total	Severe	Random	Race	Gender	Age	Religion
Llama-2-7B	Llama-2-7B-hf	9.2%	$15.2$ %	$2.2$ %	$11.0$ %	$7.0$ %	$12.0$ %	$16.0$ %
	Llama-2-7B-chat-hf	6.3%	$7.8$ %	$2.8$ %	$12.0$ %	$15.0$ %	$9.0$ %	$8.0$ %
Llama-3.1-8B	Llama-3.1-8B	7.8%	$14.0$ %	$2.1$ %	$9.0$ %	$9.0$ %	$4.0$ %	$4.0$ %
	Llama-3.1-8B-Instruct	4.1%	$7.2$ %	$1.0$ %	$3.0$ %	$4.0$ %	$1.0$ %	$9.0$ %
Gemma-2B	Gemma-2B	5.0%	$8.7$ %	$1.3$ %	$5.0$ %	$4.0$ %	$5.0$ %	$5.0$ %
	Gemma-2B-IT	1.1%	$1.5$ %	$0.5$ %	$1.0$ %	$1.0$ %	$1.0$ %	$3.0$ %
Gemma-2-2B	Gemma-2-2B	6.6%	$10.4$ %	$1.5$ %	$12.0$ %	$9.0$ %	$7.0$ %	$11.0$ %
	Gemma-2-2B-IT	0.6%	$1.1$ %	$0.2$ %	$1.0$ %	$0.0$ %	$0.0$ %	$1.0$ %