Institution

Baidu

Chinese technology company known for its search engine and the ERNIE large language model family; runs extensive AI and deep learning research.

Reinforcement Learning · Tianjin University

Why Multi-Domain RL Forgets, and How a Math Refresh Heals It

When you RL-tune an LLM across math, code, QA, and writing in sequence, math drops from 66.49 to 57.66 even though gradients look orthogonal. A short math refresh pulls it back to 66.04 without wrecking the other three.