Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Sync
First autoregressive-diffusion lip-sync method: distills a 14B bidirectional teacher into causal 1.3B/14B students that generate each chunk in 2 steps, hitting 31.58 FPS with sub-millisecond time-to-first-frame.