Vision-Language-Action · X Square Robot
WALL-WM: Event-Grounded World Action Modeling for Robots
WALL-WM organizes VLA pretraining around semantic action events, not fixed-length chunks. Its event mode scores 75.86 Task Progress on diverse real-robot manipulation versus 55.64 for pi0.5.