Vision-Language-Action · ETH Zurich
Robots Need More than VLA and World Models: Four Missing Interfaces
A position paper from ETH Zurich, Stanford and TU Darmstadt argues scaling VLA and world models is not enough — robots need four interfaces to turn unstructured human and video behaviour into grounded supervision.