Topics

Open Models

Open-weight model releases and the training recipes behind them.

Kwai Keye-VL-2.0: Open Long-Video Multimodal Agent Model

Kwai Keye-VL-2.0 is a 30B-A3B open MoE multimodal model with 256K context, strong long-video scores, and 62.0 on SWE-bench Verified.

Small Language Models · Independent Researcher

TinyLlama: An Open Small Language Model Recipe

TinyLlama turns open small language model training into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Small Language Models · Hugging Face

SmolLM2: A Fully Open 1.7B Model Built on a Public Data Recipe

SmolLM2 is a 1.7B model overtrained on ~11T tokens through four data stages. It scores 68.7 on HellaSwag and 19.4 on MMLU-Pro, beating Llama3.2-1B — and ships every dataset, not just the weights.

LLM Reasoning · Shanghai AI Laboratory

SU-01: Gold-Medal Olympiad Reasoning from a 30B Open Model

SU-01, a 30B-A3B open model from Shanghai AI Lab, hits 35 points on IMO 2025 and clears gold lines at IPhO 2024/2025 using only ~338K short SFT trajectories plus a 200-step two-stage RL pipeline.

Code Generation · Meta AI

Code Llama: Open Code Models Built on Llama 2 (7B-70B)

Code Llama continues training Llama 2 on code, reaching up to 67% on HumanEval and 65% on MBPP, the best open scores at its release, with infilling, instruction following, and 100k-token context support.

Open Models · DeepSeek

DeepSeek-V3 Explained: A 671B MoE Trained for 2.788M GPU Hours

DeepSeek-V3 is a 671B-parameter MoE model that activates only 37B params per token, matches leading closed models on many benchmarks, and was pre-trained on 14.8T tokens for just 2.788M H800 GPU hours with open weights.

Open Models · Google DeepMind

Gemma Explained: Google DeepMind's Open Models from Gemini Tech

Gemma is a 2B and 7B family of open-weight models distilled from Gemini research that beats similarly sized open models on 11 of 18 text tasks, shipped with pretrained and instruction-tuned checkpoints.

Open Models · Meta AI

Llama 2 Explained: Meta's Open Weights and the RLHF Chat Recipe

Llama 2 shipped 7B, 13B, and 70B open-weight models plus Llama 2-Chat, the first open chat model whose RLHF pipeline — including a separate safety reward model and Ghost Attention — was documented in full.

Code Generation · JetBrains

Mellum 2: A 12B MoE Code Model Running at 2.5B Compute

Mellum 2 is JetBrains' open-weight 12B Mixture-of-Experts code model that activates only 2.5B parameters per token, matching dense 4B-14B baselines on software tasks at a fraction of the per-token compute.

Open Models · Mistral AI

Mistral 7B: The 7B Open Model That Beat Llama 2 13B

Mistral 7B is a 7-billion-parameter open model that outperforms Llama 2 13B on every benchmark tested, uses grouped-query and sliding-window attention for cheap inference, and ships under Apache 2.0.

Open Models · Mistral AI

Mixtral of Experts: The 47B Sparse MoE That Runs Like a 13B Model

Mixtral 8x7B routes each token to 2 of 8 experts per layer, so it holds 47B parameters but uses only ~13B per token — and matches or beats Llama 2 70B and GPT-3.5 under Apache 2.0.

Vision-Language-Action · Allen Institute for AI

MolmoAct2: An Open Action Reasoning Stack for Real Robots

MolmoAct2 is an open vision-language-action stack that reasons in 3D before acting. On real-world DROID it hits 87.1% success, +38.7 points over the runner-up, and its Molmo2-ER brain beats GPT-5 and Gemini Robotics ER.

Multimodal Models · Sea AI Lab

OpenSearch-VL: An Open Recipe for Multimodal Search Agents

OpenSearch-VL open-sources data, code, and weights for vision-language search agents that call real search, OCR, and image tools — its 30B-A3B model lifts seven benchmarks by 13.8 points on average over Qwen3-VL.

Open Models · Alibaba Qwen Team

Qwen2.5 Explained: Alibaba's Open LLM Family, 0.5B to 72B

Qwen2.5 is Alibaba's open-weight LLM family spanning 0.5B–72B, pretrained on 18T tokens; the 72B-Instruct flagship rivals Llama-3-405B-Instruct, a model roughly 5x larger.

Open Models · Meta AI

Llama 3: A 405B Dense Open Model That Matches GPT-4

Meta released Llama 3 as a herd of language models led by a dense 405B-parameter flagship with a 128K context window, trained on 15T+ tokens and openly published with weights.