Video Generation · Virginia Tech
VideoMLA: A Low-Rank Latent KV Cache for Minute-Scale Video Diffusion
VideoMLA ports Multi-Head Latent Attention into causal video diffusion, cutting per-token KV memory 92.7% (224 vs 3,072 scalars), winning VBench at 60s, and lifting B200 throughput 1.23x.