Topics

Sequence Modeling

Architectures for modeling long ordered data such as text, audio, code, and genomics.

Diffusion Language Models · Independent Researcher

Diffusion Language Modeling: Promises and Challenges

Diffusion language modeling survey turns the state of diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Diffusion Language Models · Independent Researcher

SEDD: Discrete Diffusion Language Modeling by Ratios

SEDD turns discrete diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Language Models · Google Research

Language Models Need Sleep: A Consolidate-and-Dream Recipe

Google Research argues LLMs need an offline sleep phase to turn short-term context into stable weights. With sleep, Qwen3-8B hits 79.2% on AIME-24 and a Transformer reaches 80% on ARC few-shot, beating SEAL.

Multimodal Models · Skywork AI

Audio Interaction Model: A Streaming Audio LLM That Decides When to Speak

The Audio Interaction Model runs a perceive-decide-respond loop so an audio LLM listens, decides if and when to reply, and answers on the fly — trained on StreamAudio-2M and competitive across 8 benchmarks.

Biomolecular Modeling · AIRI

GENEB: Why Genomic Foundation Models Are So Hard to Compare

GENEB probes frozen representations from 40 genomic foundation models across 100 tasks in 13 functional categories, and finds rankings flip across categories while extra parameters buy only modest, inconsistent gains.

Efficient AI · Sapient Intelligence

HRM-Text: A 1B Model Trained From Scratch for $1,500

HRM-Text trains a 1B language model from scratch on 40B tokens for about $1,500, scoring 60.7% MMLU, 84.5% GSM8K and 56.2% MATH by swapping Transformers for a hierarchical recurrent model.

Transformers · Google Research

Attention Is All You Need: The Transformer Architecture Explained

The 2017 Transformer dropped recurrence and convolution for pure attention, hit 28.4 BLEU on WMT14 EN-DE and 41.8 on EN-FR, and trained in 3.5 days on 8 GPUs. Nearly every modern LLM inherits it.

Sequence Modeling · Carnegie Mellon University

Mamba: Selective State Spaces for Linear-Time Sequence Modeling

Mamba makes state space model parameters depend on the input, so it selectively remembers or forgets tokens. It scales linearly, runs 5x faster than Transformers, and Mamba-3B matches Transformers twice its size.