notesum.ai
Published at November 6Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
cs.CL
cs.AI
cs.LG
Released Date: November 6, 2024
Authors: Zhijian Zhuo1, Ya Wang2, Yutao Zeng2, Xiaoqing Li3, Xun Zhou2, Jinwen Ma1
Aff.: 1School of Mathematical Sciences, Peking University; 2Seed-Foundation-Model, ByteDance; 3Capital University of Economics and Business

| Loss | PPL | ARC-E | ARC-C | HellaSwag | PIQA | SciQ | Winograde | Avg. | |
| SwiGLU | 2.19 | 3.22 | 56.61 | 27.47 | 49.23 | 68.61 | 86.10 | 56.83 | 57.47 |
| GELU | 2.20 | 3.24 | 55.43 | 27.73 | 48.42 | 68.12 | 87.40 | 54.78 | 56.98 |
| ReLU | 2.21 | 3.26 | 55.68 | 28.50 | 48.59 | 68.39 | 87.10 | 54.85 | 57.18 |
| PolyReLU | 2.17 | 3.18 | 57.53 | 27.99 | 50.19 | 70.29 | 87.60 | 55.72 | 58.22 |
| PolyNorm | 2.17 | 3.17 | 59.68 | 29.01 | 50.86 | 69.15 | 87.20 | 56.20 | 58.68 |