notesum.ai
Published at October 21Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity
cs.LG
cs.AI
Released Date: October 21, 2024
Authors: Mengjiao Zhang1, Jia Xu1
Aff.: 1Stevens Institute of Technology

| Byte Tokens per Subword () | Byte Vocabulary Size () | ||
|---|---|---|---|
| 64 | 128 | 256 | |
| 4 | 0.79M | 1.05M | 1.57M |
| 8 | 1.05M | 1.57M | 2.62M |
| 16 | 1.57M | 2.62M | 4.72M |