Gemma Explained: Google DeepMind's Open Models from Gemini Tech

Quick answer

Gemma is a family of open-weight language models from Google DeepMind, released in 2 billion and 7 billion parameter sizes, built from the same research and technology behind the closed Gemini models. The headline number: Gemma outperforms similarly sized open models on 11 of 18 text-based benchmarks the team evaluated, and both pretrained and instruction-tuned checkpoints ship with downloadable weights — Google’s first serious open release built on its frontier stack.

What Gemma actually is

Gemma is not Gemini. Gemini is Google’s flagship closed, API-only model; Gemma is a separate, smaller, open-weight line that reuses Gemini’s architecture choices, data recipes, and training infrastructure but is meant to run on your own hardware. The point of the paper is less a new method and more a deliberate decision: take the engineering that went into a frontier system and release a compact, usable distillation of it under open terms, including the raw pretrained checkpoints and the chat-tuned ones.

The two sizes are chosen for two different worlds. The 7B model targets a single GPU or TPU host. The 2B model targets CPU and on-device deployment, where most open-weight models are too heavy to run at all. That split is the real product decision in the paper.

How it is built

Both sizes are decoder-only transformers trained on up to 6 trillion tokens of primarily English text from web documents, mathematics, and code. The architecture carries over now-standard choices — multi-query attention on the 2B model, multi-head on the 7B, rotary position embeddings, GeGLU activations, and RMSNorm — rather than introducing a novel block. The instruction-tuned variants are produced with supervised fine-tuning on a mix of synthetic and human-generated prompt-response pairs, followed by reinforcement learning from human feedback (RLHF) to align behavior.

The interesting engineering choice is the tokenizer: a 256k-entry SentencePiece vocabulary, far larger than the 32k typical of Llama-class models. A bigger vocabulary makes each token carry more information, which helps non-English and code, but inflates the embedding table — a real cost at 2B scale that the team accepted for broader coverage.

Key results

Beats its open peers on most tasks: Gemma outperforms similarly sized open models on 11 of 18 text-based benchmarks, the central claim of the paper.
Math and code: the 7B model reaches 46.4% on GSM8K and 32.3% on MATH, and around 32% pass@1 on HumanEval — strong for a 7B open model at release.
Reasoning and knowledge: 64.3% on MMLU and 81.2% on HellaSwag for the 7B model, competitive with larger contemporaries.
The small model is the surprise: the 2B model is positioned to run on a laptop CPU or phone while still clearing benchmarks that previously needed far larger weights.
Safety is reported, not hand-waved: the team publishes evaluations across toxicity, bias, and memorization, plus a Responsible Generative AI Toolkit, rather than only releasing the weights.

Why this release mattered

In early 2024 the credible open-weight options were dominated by Meta’s Llama line and Mistral. Gemma mattered because it was the first time Google put its frontier research behind an open release — a signal that the open-weight tier had become strategically important to the largest labs, not just a community effort. The instruction-tuned 7B in particular gave developers a permissively usable, single-GPU model with Google’s training behind it, and the 2B opened genuinely on-device use cases. The responsible-release framing — staged safety evals, a usage toolkit, documented limitations — also set a template that later open releases were measured against.

Limits and open questions

Gemma’s openness is partial: the weights are downloadable under a custom license with use restrictions, and the training data is described but not released, so “open” here means open weights, not open data or fully open source. The model is overwhelmingly English-centric, so multilingual performance lags despite the large tokenizer. The 11-of-18 win is real but selective — on the other 7 tasks it does not lead, and the comparison set is the team’s own choice, so treat it as “competitive in its size class” rather than “best.” The paper also offers little genuinely new science; its contribution is engineering and release strategy, which is exactly why some readers undervalue it and others overvalue the benchmark table. And like all base-model releases, the safety evaluations bound what the lab tested, not what fine-tuners will later do with the open weights.

FAQ

What is the difference between Gemma and Gemini?

Gemini is Google’s closed, API-only flagship; Gemma is a separate open-weight family (2B and 7B) built from the same research and technology but small enough to run on your own hardware, with downloadable pretrained and instruction-tuned checkpoints.

Is Gemma fully open source?

No. Gemma releases open weights under a custom license with usage terms, but the training data and full training code are not released, so it is open-weight rather than fully open-source.

How good is Gemma compared to other open models?

Gemma outperforms similarly sized open models on 11 of 18 text benchmarks, with the 7B model reaching 64.3% on MMLU and 46.4% on GSM8K — competitive at its size, though not a lead on every task.

Can I run Gemma on my own machine?

Yes. The 7B model is designed for a single GPU or TPU host, and the 2B model targets CPU and on-device use, which is the main reason the two sizes exist.

One line: Gemma takes the engineering behind a frontier closed model and ships it as compact open weights — small but precisely cut. Read the original paper on arXiv.