notesum.ai
Published at December 5Monet: Mixture of Monosemantic Experts for Transformers
cs.AI
Released Date: December 5, 2024
Authors: Jungwoo Park1, Young Jin Ahn2, Kee-Eung Kim2, Jaewoo Kang1
Aff.: 1Korea University; 2KAIST

| Model | Tokens | MMLU | ARC | WG | PIQA | SIQA | OBQA | HS | CSQA | Avg |
|---|---|---|---|---|---|---|---|---|---|---|
| 0-shot | ||||||||||
| LLaMA 770M | 100B | 0.340 | 0.468 | 0.524 | 0.706 | 0.431 | 0.386 | 0.507 | 0.342 | 0.463 |
| Monet-HD 850M | 100B | 0.320 | 0.460 | 0.506 | 0.699 | 0.416 | 0.364 | 0.465 | 0.337 | 0.446 |
| Monet-VD 850M | 100B | 0.328 | 0.456 | 0.530 | 0.708 | 0.417 | 0.356 | 0.488 | 0.343 | 0.453 |
| LLaMA 1.3B | 100B | 0.357 | 0.503 | 0.545 | 0.730 | 0.423 | 0.392 | 0.553 | 0.370 | 0.484 |
| Monet-HD 1.4B | 100B | 0.338 | 0.471 | 0.538 | 0.714 | 0.418 | 0.382 | 0.501 | 0.339 | 0.463 |
| Monet-VD 1.4B | 100B | 0.352 | 0.495 | 0.522 | 0.727 | 0.423 | 0.418 | 0.529 | 0.363 | 0.478 |
| LLaMA 3.8B | 100B | 0.394 | 0.578 | 0.571 | 0.760 | 0.426 | 0.412 | 0.618 | 0.404 | 0.520 |
| Monet-HD 4.1B | 100B | 0.375 | 0.558 | 0.560 | 0.741 | 0.427 | 0.414 | 0.571 | 0.379 | 0.503 |
| Monet-VD 4.1B | 100B | 0.380 | 0.547 | 0.557 | 0.751 | 0.437 | 0.424 | 0.604 | 0.389 | 0.511 |
| 5-shot | ||||||||||
| LLaMA 770M | 100B | 0.350 | 0.554 | 0.509 | 0.713 | 0.439 | 0.386 | 0.523 | 0.459 | 0.492 |
| Monet-HD 850M | 100B | 0.332 | 0.537 | 0.510 | 0.697 | 0.409 | 0.346 | 0.479 | 0.420 | 0.466 |
| Monet-VD 850M | 100B | 0.341 | 0.548 | 0.520 | 0.709 | 0.437 | 0.368 | 0.504 | 0.454 | 0.485 |
| LLaMA 1.3B | 100B | 0.368 | 0.577 | 0.515 | 0.731 | 0.458 | 0.422 | 0.565 | 0.511 | 0.518 |
| Monet-HD 1.4B | 100B | 0.352 | 0.544 | 0.530 | 0.720 | 0.432 | 0.360 | 0.518 | 0.441 | 0.487 |
| Monet-VD 1.4B | 100B | 0.360 | 0.547 | 0.526 | 0.730 | 0.441 | 0.422 | 0.551 | 0.501 | 0.510 |
| LLaMA 3.8B | 100B | 0.408 | 0.635 | 0.578 | 0.771 | 0.472 | 0.452 | 0.645 | 0.574 | 0.567 |
| Monet-HD 4.1B | 100B | 0.385 | 0.603 | 0.545 | 0.742 | 0.463 | 0.412 | 0.588 | 0.545 | 0.535 |
| Monet-VD 4.1B | 100B | 0.398 | 0.625 | 0.564 | 0.761 | 0.470 | 0.438 | 0.619 | 0.525 | 0.550 |
| Off-the-shelf Models (0-shot) | ||||||||||
| OLMoE 6.9B | 100B | 0.349 | 0.521 | 0.551 | 0.754 | 0.432 | 0.384 | 0.620 | 0.402 | 0.502 |
| 5000B | 0.429 | 0.625 | 0.631 | 0.804 | 0.445 | 0.444 | 0.747 | 0.446 | 0.571 | |
| Gemma 2 2B | 2000B | 0.432 | 0.651 | 0.630 | 0.792 | 0.443 | 0.428 | 0.709 | 0.482 | 0.571 |
| + SAE 65K MLP | (8B) | 0.325 | 0.473 | 0.562 | 0.723 | 0.436 | 0.326 | 0.537 | 0.401 | 0.473 |
| + SAE 65K Res | (8B) | 0.254 | 0.259 | 0.494 | 0.506 | 0.387 | 0.294 | 0.259 | 0.239 | 0.337 |