notesum.ai

Published at November 6

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning

cs.AI
cs.CL
cs.HC
cs.RO

Released Date: November 6, 2024

Authors: Zhirui Deng1, Zhicheng Dou1, Yutao Zhu1, Ji-Rong Wen1, Ruibin Xiong2, Mang Wang2, Weipeng Chen

Aff.: 1Renmin University of China, Beijing, China; 2Baichuan Intelligent Technology, Beijing, China

Arxiv: http://arxiv.org/abs/2411.03817v1