Reinforcement Learning · University of Edinburgh
SCOPE: Self-Play RL That Trains LLMs on Open-Ended Tasks
SCOPE co-evolves a task-writing Challenger and a retrieval Solver, judged by a frozen copy of the base model, lifting eight open-ended benchmarks by up to +10.4 points with zero curated prompts.