notesum.ai
Published at October 21CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
cs.CL
cs.AI
Released Date: October 21, 2024
Authors: Maosong Cao, Alexander Lam, Haodong Duan, Hongwei Liu, Songyang Zhang, Kai Chen

| Attribution | Dataset Name | Data Format | Number | Language |
| Open-source Judge Data | AlpacaFarm (Dubois et al., 2024b) | Pairwise | 39k | EN |
| Auto-J (Li et al., 2023) | Pointwise, Pairwise, Generative | 9k | ZH, EN | |
| PandaLM (Wang et al., 2023) | Pairwise | 287k | EN | |
| JudgeLM (Zhu et al., 2023) | Pointwise | 100k | EN | |
| LLM-Eval2 (Zhang et al., 2023) | Pointwise, Generative | 10k | ZH | |
| CritiqueBench (Lan et al., 2024) | Generative | 1k | EN | |
| UltraFeedback (Cui et al., 2023) | Pointwise, Generative | 380k | EN | |
| Open-source Reward Data | OffsetBias (Park et al., 2024) | Pairwise | 8k | EN |
| Hendrydong (Dong et al., 2024) | Pairwise | 700k | EN | |
| SkyWorker (Shiwen et al., 2024) | Pairwise | 80k | EN | |
| Airoboros | Pairwise | 36k | EN | |
| Anthropic | Pairwise | 161k | EN | |
| PKU Alignment | Pairwise | 82k | EN | |
| Self Collect Judge Data | CJ-Judge-Data-v1 | Pointwise, Pairwise, Generative | 60k | ZH, EN |
| Self Collect Reward Data | Math Code Preference | Pairwise | 11k | EN |
| Chinese Math | Pairwise | 76k | ZH | |
| LengthControl | Pairwise | 0.6k | EN | |
| Language Match | Pairwise | 0.5k | ZH, EN |