notesum.ai
Published at November 6From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning
cs.AI
cs.CL
cs.HC
cs.RO
Released Date: November 6, 2024
Authors: Zhirui Deng1, Zhicheng Dou1, Yutao Zhu1, Ji-Rong Wen1, Ruibin Xiong2, Mang Wang2, Weipeng Chen
Aff.: 1Renmin University of China, Beijing, China; 2Baichuan Intelligent Technology, Beijing, China