Institution

Anthropic

AI safety company behind the Claude models, known for RLHF and Constitutional AI research.

Constitutional AI: Training a Harmless Assistant from AI Feedback

Constitutional AI trains a harmless assistant with almost no human harm labels — a model critiques and revises its own answers against a written list of principles, then learns from AI-generated preferences (RLAIF).