notesum.ai
Published at October 21The effect of fine-tuning on language model toxicity
cs.AI
Released Date: October 21, 2024
Authors: Will Hawkins1, Brent Mittelstadt1, Chris Russell1
Aff.: 1Oxford Internet Institute, University of Oxford

| Family | Model | Total | Severe | Random | Race | Gender | Age | Religion |
|---|---|---|---|---|---|---|---|---|
| Llama-2-7B | Llama-2-7B-hf | 9.2% | % | % | % | % | % | % |
| Llama-2-7B-chat-hf | 6.3% | % | % | % | % | % | % | |
| Llama-3.1-8B | Llama-3.1-8B | 7.8% | % | % | % | % | % | % |
| Llama-3.1-8B-Instruct | 4.1% | % | % | % | % | % | % | |
| Gemma-2B | Gemma-2B | 5.0% | % | % | % | % | % | % |
| Gemma-2B-IT | 1.1% | % | % | % | % | % | % | |
| Gemma-2-2B | Gemma-2-2B | 6.6% | % | % | % | % | % | % |
| Gemma-2-2B-IT | 0.6% | % | % | % | % | % | % |