Llama 2 Explained: Meta's Open Weights and the RLHF Chat Recipe
Llama 2 shipped 7B, 13B, and 70B open-weight models plus Llama 2-Chat, the first open chat model whose RLHF pipeline — including a separate safety reward model and Ghost Attention — was documented in full.
Quick answer
Llama 2 is a family of open-weight language models from Meta at 7B, 13B, and 70B parameters, trained on roughly 2 trillion tokens, plus a chat-tuned variant called Llama 2-Chat. Its real contribution is not the base model’s raw scores — it is that Meta published a step-by-step RLHF recipe, including over 1 million human preference comparisons and a deliberate split into separate helpfulness and safety reward models, and then released the weights for commercial use. That combination of a documented chat pipeline and a usable license is why Llama 2, not the stronger closed models of 2023, became the default base for the open ecosystem.
The gap Llama 2 was built to close
By mid-2023 the open community had decent pretrained bases but no credible open chat model. Aligning a base model into a helpful, safe assistant via RLHF was treated as proprietary know-how — the published descriptions from closed labs were thin, and the only strong instruction-tuned models were API-only. Llama 1 had also been research-only, so nobody could legally ship products on it. Llama 2 attacks both problems at once: it documents the chat-alignment pipeline in unusual detail and ships under a license that permits commercial use below 700 million monthly active users.
How the chat pipeline actually works
Llama 2-Chat is built in stages. First the base model is pretrained on ~2T tokens, with the 70B model using grouped-query attention to keep inference cheaper at scale. Then comes supervised fine-tuning on a curated set of high-quality instruction examples — Meta’s finding here is blunt and useful: a few tens of thousands of clean SFT examples beat millions of scraped ones, so they stopped at around 27K.
The alignment then runs through iterative RLHF. Meta collected over 1 million binary human preference comparisons and trained two reward models rather than one — a helpfulness model and a safety model — because optimizing a single blended reward made the model worse at both. They alternate rejection sampling (sample many responses, fine-tune on the best-scored ones) with PPO across several rounds. One concrete trick worth naming is Ghost Attention (GAtt), a fine-tuning method that keeps the model obeying a system instruction (a persona, a “always answer in French”) across many dialogue turns instead of forgetting it after one or two.
Key results
- Sizes and scale: open weights at 7B, 13B, and 70B parameters, pretrained on about 2 trillion tokens with a 4,096-token context — double Llama 1’s context.
- Against open chat models: Llama 2-Chat 70B outperforms the open chat models available at release on most of the helpfulness and safety benchmarks Meta tested.
- Against closed models: in Meta’s own human evaluations, Llama 2-Chat 70B is roughly on par with ChatGPT (GPT-3.5) on helpfulness and safety — Meta’s framing is that it “may be a suitable substitute for closed-source models,” which is a hedged claim, not a win over GPT-4.
- Preference data: the RLHF used over 1M human preference comparisons, larger than most public preference datasets at the time.
- Safety, measured honestly: Llama 2-Chat reports very low violation rates in red-teaming, but Meta also shows the helpfulness/safety trade-off explicitly rather than hiding it.
Why it became the open default
The lasting impact is ecosystem, not leaderboard. Because the weights were downloadable and commercially licensable, Llama 2 became the substrate that thousands of fine-tunes, quantizations, and products were built on through 2023 and 2024 — and the paper’s RLHF section became a de facto textbook for teams trying to reproduce chat alignment without a frontier-lab budget. The two-reward-model design and the GAtt trick in particular got copied widely. If you want one reason it mattered: it turned “how do you actually RLHF a chat model” from tribal knowledge into a written recipe.
Limits and open questions
The license is open-ish, not open. The under-700M-MAU clause and a restriction on using outputs to train competing models mean Llama 2 is not OSI open source, and several groups objected to Meta calling it “open.” On capability, the base model trails the best closed models of its era on reasoning and code, and the 4,096-token context is short by later standards. The safety tuning also has a documented cost: heavy safety RLHF made early Llama 2-Chat refuse plenty of benign prompts, the over-refusal problem that later models had to walk back. And the headline “on par with ChatGPT” rests on Meta’s own human evaluation, which is a reasonable signal but not an independent benchmark — treat it as a vendor claim, not a verdict.
FAQ
What sizes does Llama 2 come in?
Llama 2 ships in three open-weight sizes — 7B, 13B, and 70B parameters — each available as a pretrained base model and as a chat-tuned Llama 2-Chat variant.
How is Llama 2-Chat different from the base Llama 2?
The base Llama 2 is a pretrained next-token model. Llama 2-Chat adds supervised fine-tuning plus iterative RLHF using separate helpfulness and safety reward models, which is what makes it follow instructions and refuse unsafe requests.
Is Llama 2 actually open source?
Not by the OSI definition. The weights are free to download and usable commercially under 700M monthly active users, but the license restricts the largest deployments and bars using outputs to train rival models, so it is best called “open weights,” not open source.
How does Llama 2 compare to ChatGPT?
In Meta’s own human evaluations, Llama 2-Chat 70B is roughly comparable to GPT-3.5-era ChatGPT on helpfulness and safety. It does not match GPT-4, and the comparison is Meta’s internal eval rather than an independent benchmark.
One line: Llama 2’s gift to the field was not a benchmark win but a written, reproducible RLHF recipe shipped with usable weights. Read the original paper on arXiv.