notesum.ai
Published at December 4LuxEmbedder: A Cross-Lingual Approach to Enhanced Luxembourgish Sentence Embeddings
cs.CL
cs.AI
Released Date: December 4, 2024
Authors: Fred Philippy1, Siwen Guo1, Jacques Klein2, Tegawendé F. Bissyandé2
Aff.: 1Zortify S.A.; 2University of Luxembourg

| Model | CL Transfer | Bitext Mining | Zero-Shot Classific. | ParaLux | |
|---|---|---|---|---|---|
| Proprietary | Cohere/embed-multilingual-light-v3.0 | 70.89 | 50.10 | 40.39 | 37.50 |
| Cohere/embed-multilingual-v3.0 | 79.49 | 59.38 | 53.33 | 49.04 | |
| OpenAI/text-embedding-3-small | 72.59 | 39.30 | 40.20 | 15.71 | |
| OpenAI/text-embedding-3-large | 86.25 | 56.04 | 58.82 | 26.28 | |
| Open-Source | mBERT (MEAN) | 70.53 | 28.44 | 15.49 | 5.13 |
| mBERT(CLS) | 70.20 | 22.27 | 13.73 | 4.81 | |
| LuxemBERT (MEAN) | 48.47 | 30.33 | 14.02 | 7.69 | |
| LuxemBERT(CLS) | 56.86 | 21.94 | 33.73 | 14.42 | |
| LASER | 62.70 | 62.96 | 11.08 | 16.03 | |
| LaBSE | 80.88 | 70.11 | 43.24 | 38.14 | |
| LuxEmbedder | 83.39 | 70.24 | 65.59 | 52.24 |