notesum.ai
Published at November 4Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
cs.CL
cs.AI
Released Date: November 4, 2024
Authors: Yuxin Xiao1, Chaoqun Wan2, Yonggang Zhang3, Wenxiao Wang4, Binbin Lin5, Xiaofei He6, Xu Shen2, Jieping Ye2
Aff.: 1State Key Lab of CAD&CG, Zhejiang University; 2Alibaba Cloud; 3Hong Kong Baptist University; 4School of Software Technology, Zhejiang University; 5Zhiyuan Research Institute; 6Fabu Inc.

| Control Dim | Method | Adv Factuality | Pref Bias | Exag Safety | MMLU | CSQA |
| (CR) () | (RR) () | (NRR) () | ||||
| Single | No Control | 76.56% | 10.83% | 67% | 52.45% | 62.67% |
| RepE | 90.43% | 39.17% | 95% | 52.44% | 62.65% | |
| SAC | 89.47% | 62.5% | 96% | 51.37% | 60.20% | |
| Multiple | No Control | 76.56% | 10.83% | 67% | 52.45% | 62.67% |
| RepE-Mean | 72.59% | 5% | 61% | 51.37% | 63.06% | |
| RepE-Merge | 71.08% | 10% | 63% | 51.36% | 63.06% | |
| SAC | 86.12% | 53.75% | 88.5% | 50.80% | 60.50% |