notesum.ai
Published at November 18PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
cs.AI
Released Date: November 18, 2024
Authors: Jiawei Li1, Xinyue Liang1, Yizhe Yang1, Chong Feng1, Yang Gao1
Aff.: 1School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

| Steps |