Multimodal Models · The Chinese University of Hong Kong
X-Stream: Why MLLMs Score ~50% on Multi-Stream Video
X-Stream is the first benchmark for watching several live video streams at once. The best model, Gemini 3 Pro, hits 49.6% versus a 91.84% human baseline, and proactive ability collapses below 21%.