Institution

Southern University of Science and Technology

Public research university in Shenzhen, China, with a strong focus on STEM and machine-learning research.

Reinforcement Learning · Alibaba Qwen Team

APPO: Agentic Procedural Policy Optimization for RL Agents

APPO branches RL rollouts at high-uncertainty, high-influence tokens instead of tool-call boundaries, lifting Qwen2.5-7B by 3.9 points over ARPO across 13 math, multi-hop, and deep-search benchmarks.