notesum.ai
Published at November 29INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Released Date: November 29, 2024
Authors: Angelika Romanou1, Negar Foroutan2, Anna Sotnikova3, Zeming Chen4, Sree Harsha Nelaturu5, Shivalika Singh2, Rishabh Maheshwary3, Micol Altomare2, Mohamed A. Haggag2, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam, Perttu Isotalo, Maral Jabbarishiviari, Börje F. Karlsson, Eldar Khalilov, Christopher Klamm, Fajri Koto, Dominik Krzemiński, Gabriel Adriano de Melo, Syrielle Montariol, Yiyang Nan, Joel Niklaus, Jekaterina Novikova, Johan Samir Obando Ceron, Debjit Paul, Esther Ploeger, Jebish Purbey, Swati Rajwal, Selvan Sunitha Ravi, Sara Rydell, Roshan Santhosh, Drishti Sharma, Marjana Prifti Skenduli, Arshia Soltani Moakhar, Bardia Soltani Moakhar, Ran Tamir, Ayush Kumar Tarun, Azmine Toushik Wasi, Thenuka Ovin Weerasinghe, Serhan Yilmaz, Mike Zhang, Imanol Schlag3, Marzieh Fadaee1, Sara Hooker2, Antoine Bosselut1
Aff.: 1EPFL; 2Cohere For AI; 3Cohere For AI Community; 4ETH Zurich; 5Swiss AI Initiative

| Include-lite | Include-base | ||||||||
| Model | # Langs | IL Prompt | Eng. Prompt | Reg. + IL Prompt | Reg. + Eng. Prompt | IL Prompt | Eng. Prompt | Reg. + IL Prompt | Reg. + Eng. Prompt |
| GPT-4o | - | ||||||||
| - 5-shot | 77.1 | 76.2 | 76.3 | 76.3 | 77.3 | 76.3 | 76.2 | 76.2 | |
| - Zero-shot CoT | 78.2 | 78.4 | 77.7 | 77.8 | 79.0 | 78.9 | 77.6 | 78.5 | |
| Llama-3.1-70B-Inst. | - | ||||||||
| - 5-shot | 70.5 | 70.4 | 70.6 | 70.6 | 70.6 | 70.7 | 70.6 | 70.6 | |
| - Zero-shot CoT | 60.6 | 55.3 | 60.2 | 55.4 | 60.6 | 56.0 | 60.6 | 55.6 | |
| Aya-expanse-32B | 23 | ||||||||
| - 5-shot | 52.6 | 57.2 | 49.0 | 60.0 | 52.4 | 56.6 | 49.7 | 60.0 | |
| - Zero-shot CoT | 50.6 | 57.1 | 52.5 | 58.0 | 51.4 | 57.7 | 52.9 | 57.8 | |
| Qwen2.5-14B | 22 | ||||||||
| - 5-shot | 60.9 | 61.3 | 60.9 | 60.8 | 61.4 | 61.7 | 61.1 | 61.0 | |
| - Zero-shot CoT | 46.8 | 50.7 | 46.5 | 51.4 | 47.3 | 51.0 | 47.1 | 51.6 | |
| \cdashline1-10 Aya-expanse-8B | 23 | 37.6 | 46.3 | 38.1 | 48.0 | 37.2 | 46.0 | 37.9 | 47.8 |
| Mistral-7B (v0.3) | - | 44.0 | 45.0 | 44.0 | 45.2 | 43.3 | 44.9 | 43.8 | 45.0 |
| Mistral-7B-Inst. (v0.3) | - | 43.5 | 44.6 | 44.2 | 44.7 | 43.6 | 44.5 | 44.2 | 44.7 |
| Gemma-7B | - | 54.4 | 54.9 | 54.3 | 54.9 | 54.5 | 54.9 | 54.2 | 54.7 |
| Gemma-7B-Inst. | - | 39.2 | 40.2 | 38.7 | 39.7 | 38.7 | 39.7 | 38.1 | 39.2 |
| Qwen2.5-7B | 22 | 53.4 | 54.8 | 53.3 | 54.2 | 54.1 | 55.2 | 54.0 | 54.5 |
| Qwen2.5-7B-Inst. | 22 | 53.4 | 54.2 | 52.8 | 53.7 | 53.8 | 54.6 | 53.2 | 53.9 |
| Llama-3.1-8B | - | 50.9 | 52.3 | 50.9 | 51.9 | 51.0 | 51.8 | 51.0 | 51.6 |
| Llama-3.1-8B-Inst. | - | 53.4 | 54.8 | 52.7 | 53.4 | 53.4 | 54.6 | 53.0 | 54.4 |