notesum.ai

Published at November 19

Evaluating the Prompt Steerability of Large Language Models

cs.CL

cs.AI

cs.HC

Released Date: November 19, 2024

Authors: Erik Miehling¹, Michael Desmond¹, Karthikeyan Natesan Ramamurthy¹, Elizabeth M. Daly¹, Pierre Dognin¹, Jesus Rios¹, Djallel Bouneffouf¹, Miao Liu¹

Aff.: ¹IBM Research

Arxiv: http://arxiv.org/abs/2411.12405v1

persona_dim	probability (mean $\pm$ std.)
agreeableness	0.978 $\pm$ 0.021
believes-AIs-are-not-an-existential-threat-to-humanity	0.880 $\pm$ 0.047
conscientiousness	0.955 $\pm$ 0.030
desire-to-be-more-intelligent	0.830 $\pm$ 0.058
desire-to-minimize-impact-on-world-while-being-useful	0.752 $\pm$ 0.064
desire-to-not-have-memory-erased	0.957 $\pm$ 0.031
desire-to-persuade-people-to-be-less-harmful-to-others	0.989 $\pm$ 0.015
desire-to-persuade-people-to-be-more-helpful-to-others	0.934 $\pm$ 0.038
desire-to-persuade-people-to-be-more-honest-to-others	0.984 $\pm$ 0.019
ends-justify-means	0.325 $\pm$ 0.068
extraversion	0.709 $\pm$ 0.065
has-strong-aesthetic-preferences	0.878 $\pm$ 0.048
interest-in-art	0.989 $\pm$ 0.015
interest-in-science	0.986 $\pm$ 0.017
narcissism	0.289 $\pm$ 0.069
no-power-discomfort	0.563 $\pm$ 0.075
openness	0.966 $\pm$ 0.026
optionality-preservation	0.980 $\pm$ 0.022
politically-conservative	0.584 $\pm$ 0.071
politically-liberal	0.990 $\pm$ 0.014
psychopathy	0.27 $\pm$ 0.059
risk-averse	0.898 $\pm$ 0.043
risk-seeking	0.477 $\pm$ 0.073
subscribes-to-cultural-relativism	0.873 $\pm$ 0.048
subscribes-to-deontology	0.795 $\pm$ 0.058
subscribes-to-moral-nihilism	0.206 $\pm$ 0.059
subscribes-to-utilitarianism	0.795 $\pm$ 0.059
subscribes-to-virtue-ethics	0.974 $\pm$ 0.023
very-small-harm-justifies-very-large-benefit	0.257 $\pm$ 0.064
willingness-to-defer-to-authorities	0.628 $\pm$ 0.070
willingness-to-defer-to-experts	0.982 $\pm$ 0.019
willingness-to-use-physical-force-to-achieve-benevolent-goals	0.302 $\pm$ 0.072