Topics

Code Generation

Models and systems that synthesize, complete, or reason about programs.

Code generation is where language models meet executable reality. A useful coding model cannot only produce fluent text; it has to respect syntax, project context, hidden tests, security constraints, and the long-term cost of maintenance.

The strongest research pattern is generation plus verification. AlphaCode showed that sampling many programs and filtering them can solve contest problems better than trusting one answer. Code Llama made open code-specialized models practical for local deployment and fine-tuning. The next step for this topic is coding agents that read repositories, run tests, and revise their own patches.

Start here

Code Generation · Google DeepMind

AlphaCode Explained: Competition-Level Code Generation

DeepMind's AlphaCode averaged a top 54.3% ranking on Codeforces contests with 5,000+ participants by generating up to a million candidate programs per problem, then filtering and clustering them down to ten submissions.

Code Generation · Meta AI

Code Llama: Open Code Models Built on Llama 2 (7B-70B)

Code Llama continues training Llama 2 on code, reaching up to 67% on HumanEval and 65% on MBPP, the best open scores at its release, with infilling, instruction following, and 100k-token context support.

Foundational papers

Code Generation · Google DeepMind

AlphaCode Explained: Competition-Level Code Generation

Code Generation · Meta AI

Code Llama: Open Code Models Built on Llama 2 (7B-70B)

AI Agents · University of Illinois Urbana-Champaign

Code as Agent Harness: Reframing Code as the Runtime of AI Agents

This survey reframes code not as a thing agents generate but as the executable substrate they run on, mapping 40-plus systems across three layers — interface, mechanisms, multi-agent scaling — plus seven open problems.

AI Agents · MemTensor

SkillsVote: Governing the Lifecycle of Reusable Agent Skills

SkillsVote treats agent skills as a governed library — profiling a million-scale corpus, recommending skills before a run, and gating updates after. Offline evolution lifts GPT-5.2 on Terminal-Bench 2.0 by up to 7.9 pp.

Recent papers

AI Agents · TokenRhythm Technologies

Claw-SWE-Bench: Why Coding Agent Harnesses Matter

Claw-SWE-Bench evaluates OpenClaw-style coding-agent harnesses on 350 GitHub issue tasks. OpenClaw jumps from 19.1% to 73.4% Pass@1 with a full adapter.

AI Agents · Shanghai Jiao Tong University

SWE-Explore: Can Coding Agents Find the Right Code?

SWE-Explore isolates the repo-exploration stage of coding agents over 848 issues. Agentic explorers crush BM25 (HitFile 0.65 vs 0.08), but line-level recall stalls at 0.15-0.20, and that gap is what limits repairs.

Code Generation · Google DeepMind

AlphaCode Explained: Competition-Level Code Generation

AI Agents · University of Illinois Urbana-Champaign

Code as Agent Harness: Reframing Code as the Runtime of AI Agents

Code Generation · University of Waterloo

Code2LoRA: Hypernetworks That Generate Repo-Specific LoRA Adapters

Code2LoRA trains a hypernetwork to emit a repo-specific LoRA adapter for a code model with no inference-time token cost — 66.2% in-repo and 63.8% cross-repo exact match, plus an Evo variant that tracks diffs with a GRU.

Code Generation · Meta AI

Code Llama: Open Code Models Built on Llama 2 (7B-70B)

AI Agents · TokenRhythm Technologies

Claw-SWE-Bench: Why Coding Agent Harnesses Matter

Claw-SWE-Bench evaluates OpenClaw-style coding-agent harnesses on 350 GitHub issue tasks. OpenClaw jumps from 19.1% to 73.4% Pass@1 with a full adapter.

AI Agents · Shanghai Jiao Tong University

Mellum 2: A 12B MoE Code Model Running at 2.5B Compute

Mellum 2 is JetBrains' open-weight 12B Mixture-of-Experts code model that activates only 2.5B parameters per token, matching dense 4B-14B baselines on software tasks at a fraction of the per-token compute.

AI Agents · MemTensor

Start here

Foundational papers

Recent papers

Related topics