notesum.ai

Published at November 18

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

cs.AI

Released Date: November 18, 2024

Authors: Jiawei Li1, Xinyue Liang1, Yizhe Yang1, Chong Feng1, Yang Gao1

Aff.: 1School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

Arxiv: http://arxiv.org/abs/2411.11681v1