HeadlinesBriefing favicon HeadlinesBriefing.com

WGO-Bench Benchmark Advances Robot Video Subtask Annotation Efficiency

Hacker News •
×

Researchers have released WGO-Bench, a benchmark designed to evaluate vision-language models on robotics subtask annotation. The dataset contains 743 annotated segments across 100 episodes from Galaxea World, DROID, and Hom ER v2 sources, covering 62 unique high-level task instructions. This addresses the growing need for scalable annotation as robotics data collection expands.

The team ran over 60 experiments to optimize their annotation pipeline. Their best approach achieved 0.306 F1 score for subtask segmentation and 61.0% accuracy for labeling. Gemini 3.5 Flash emerged as the top performer, outperforming the best non-Gemini model by 24.5%. The end-to-end pipeline reached 0.168 F1 while maintaining efficiency through contact sheet techniques.

The complete pipeline runs at $2.64 per hour of video (batch pricing), roughly 19 times cheaper than human annotation. Researchers made the full system open source through Refiner, complete with ready-to-use example code for processing custom videos.

Subtask annotation solves a fundamental robot learning challenge: breaking complex instructions into atomic manipulation events. This fine-grained supervision enables robots to understand when one action ends and the next begins, proving essential for recent advances in vision-language-action models like π₀.₅ and RT-H.