Code2LoRA: Hypernetworks That Generate Repo-Specific LoRA Adapters

Quick answer

Code2LoRA generates a repository-specific LoRA adapter directly from the repository’s code using a trained hypernetwork, so a code language model gets repo context baked into its weights instead of stuffed into the prompt — with zero extra tokens at inference time. On the new RepoPeftBench (604 Python repos) it reaches 66.2% in-repo and 63.8% cross-repo exact match for code completion, and the evolving-codebase variant, Code2LoRA-Evo, hits 60.3% cross-repo while beating its baseline by 5.2 points by updating a GRU hidden state on every code diff.

The repo-context problem it attacks

Code models keep failing on the same thing: they don’t know your repository. The standard fixes are retrieval-augmented generation (paste relevant files into the context window) or per-repo fine-tuning. RAG works but burns context budget and adds latency on every single request; fine-tuning a full LoRA per repository is heavy and goes stale the moment someone merges a PR. Code2LoRA’s bet is that you can amortize this: train one hypernetwork once, then have it produce the adapter weights for any repository in a forward pass, no per-repo gradient descent and no tokens spent at inference.

How Code2LoRA works

The hypernetwork takes a representation of the target repository and outputs the low-rank matrices of a LoRA adapter, which is then merged into the frozen base code model. Because the repo knowledge lives in the adapter, the prompt at inference time is just the local code you’re completing — none of the context-window tax that RAG pays. The paper splits this into two tracks for two different realities of software:

Code2LoRA-Static targets stable codebases. The repository is treated as fixed, the hypernetwork emits one adapter, and that adapter serves all completions in that repo.
Code2LoRA-Evo targets repositories under active development. Instead of regenerating from scratch after every change, it carries a GRU hidden state that updates per code diff, so the adapter evolves incrementally as commits land. This is the more interesting contribution — it treats “the repo” as a sequence of diffs rather than a snapshot.

Why generating adapters beats retrieving context

The honest framing: RAG and Code2LoRA optimize different costs. RAG pays at read time — every request re-pays retrieval and context tokens. Code2LoRA pays at write time — generate the adapter once (or update it per diff), then every completion is cheap and prompt-light. For a repository that gets thousands of completions, moving the cost from per-request to per-repo-update is the whole point. The Evo design is what makes this defensible for real engineering, where code is never static and a snapshot-based adapter would decay between commits.

Key results

Static, in-repo: 66.2% exact match on code completion within repositories the model has an adapter for.
Static, cross-repo: 63.8% exact match, i.e. the hypernetwork generalizes to repos it must produce a fresh adapter for.
Evolution track: 60.3% cross-repo exact match, 5.2 percentage points over the comparison baseline — the gain attributable to the per-diff GRU update rather than a static snapshot adapter.
Benchmark scale: RepoPeftBench covers 604 Python repositories, and the authors release the benchmark, code, and checkpoints on Hugging Face.
Inference cost: no inference-time token overhead — the repo signal is in the adapter weights, not the prompt.

Limits and open questions

The numbers are Python-only and exact-match-only. Exact match is a strict but narrow metric — it rewards reproducing the reference token-for-token and says little about whether a different correct completion was generated, so the real-world helpfulness could be higher or lower than 66% suggests. Everything is measured on RepoPeftBench, a benchmark the authors themselves built, so cross-paper comparison is hard until others adopt it. The Evo variant’s per-diff GRU update is elegant but raises an obvious question the abstract doesn’t settle: how far can a hidden state drift before it needs a full regeneration, and what happens to a repo after thousands of commits? And generating an adapter still requires the hypernetwork to have been trained on a representative distribution of repos — a repository unlike anything in training is exactly where you’d most want repo-specific context and least likely to get a good adapter.

FAQ

What is Code2LoRA and how does it differ from RAG for code?

Code2LoRA uses a hypernetwork to generate a repository-specific LoRA adapter that is merged into a frozen code model, putting repo knowledge into weights. Unlike RAG, it spends no extra context tokens at inference time — RAG re-pays retrieval and context cost on every request, while Code2LoRA pays once per repo (or per diff).

What is the difference between Code2LoRA-Static and Code2LoRA-Evo?

Code2LoRA-Static treats a repository as fixed and emits one adapter for it. Code2LoRA-Evo handles repositories under active development by maintaining a GRU hidden state that updates on each code diff, so the adapter changes incrementally as commits land instead of being regenerated from a snapshot.

How accurate is Code2LoRA on RepoPeftBench?

On RepoPeftBench (604 Python repos), Code2LoRA-Static reaches 66.2% in-repo and 63.8% cross-repo exact match for code completion. Code2LoRA-Evo reaches 60.3% cross-repo exact match, 5.2 points above its baseline.

Does Code2LoRA need to be retrained for every repository?

No — that is the point. The hypernetwork is trained once and then generates a LoRA adapter for a new repository in a forward pass, without per-repo gradient descent. The Evo variant updates the existing adapter per diff rather than retraining.

One line: train a hypernetwork to write the LoRA, not the repo to write the prompt. Read the original paper on arXiv.