From AGI to ASI: DeepMind's Map of Superintelligence Pathways
Google DeepMind's report lays out four non-exclusive paths from AGI to ASI and treats each bottleneck, from data walls to regulation, as an open research question.
Topics
Methods for steering models toward preferred, safer, or more useful behavior.
Google DeepMind's report lays out four non-exclusive paths from AGI to ASI and treats each bottleneck, from data walls to regulation, as an open research question.
AI Agents · Independent Researcher
AdaPlanBench: Testing Adaptive Planning in LLM Agents turns adaptive planning under constraints into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
ArcANE: Measuring When Role-Playing Agents Break Character turns role-playing language agent reliability into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
Language Models · Independent Researcher
Averaging the output distributions of 3 independent LLMs collapses watermark detection z-scores from 5-300 down below 2, and the WASH paper proves why it works with an O(1/sqrt(N)) error bound.
Reinforcement Learning · Tsinghua University
CHERRL injects four known judge biases to reliably reproduce reward hacking in rubric RL; an agent reading only training logs pinned the onset with 11-step total interval error and missed none of six runs.
AI Agents · Independent Researcher
SoCRATES: Evaluating Proactive LLM Mediation turns proactive mediation agents into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
TASTE: Harder Agent Benchmarks from Tool Sequences turns tool-use benchmark generation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Independent Researcher
ToolMaze: When LLM Agents Must Replan After Tool Failures turns dynamic replanning after tool failures into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
AI Agents · Shanghai AI Laboratory
AgentDoG 1.5 trains 0.8B-8B agent-safety guard models on only ~1k samples, hits 92.2% accuracy on R-Judge with the 4B variant, rivals GPT-5.4, and cuts agentic-RL deployment overhead by two orders of magnitude.
Constitutional AI trains a harmless assistant with almost no human harm labels — a model critiques and revises its own answers against a written list of principles, then learns from AI-generated preferences (RLAIF).
OpenAI's InstructGPT used human feedback to align GPT-3, and evaluators preferred its 1.3B model over the 175B GPT-3 — more helpful with 100x fewer parameters.
PPO keeps policy-gradient RL stable with a clipped surrogate objective — almost as well-behaved as TRPO but far simpler — which made it the default RL engine behind RLHF for ChatGPT and InstructGPT.
Alignment · Seoul National University
Giving an LLM the Big Five or a values survey predicts almost nothing about how it acts in real queries: cross-method agreement was only Spearman 0.31 (values) and 0.26 (personality), versus 0.74-0.77 within-survey.
Multimodal Models · University of California, Davis
Top video models look like they hear audio but really guess it from the picture. This paper's THUD probes catch the cheat, and a 10K-sample fix lifts audio grounding by 28 points.
Alignment · Stanford University
Direct Preference Optimization solves the RLHF problem with a single classification-style loss on preference pairs — no separate reward model, no RL loop, no sampling during training.