HeadlinesBriefing favicon HeadlinesBriefing.com

DeepMind's LoGeR Breaks 3D Video Reconstruction Barriers

Hacker News •
×

DeepMind and UC Berkeley researchers have unveiled LoGeR, a breakthrough system that scales 3D reconstruction to videos up to 19,000 frames long. The team tackled two fundamental barriers: the quadratic complexity of processing long video sequences and the training limitations that prevent models from generalizing to large-scale environments.

LoGeR introduces a hybrid memory architecture that processes video in chunks while maintaining global coherence. The system combines Sliding Window Attention for precise local alignment with Test-Time Training for long-range consistency. This dual-pathway approach preserves high-fidelity geometry while preventing scale drift over massive sequences, achieving a 30.8% improvement over previous feedforward methods on the 19k-frame VBR dataset.

On standard benchmarks like KITTI, LoGeR achieves an average ATE of 18.65, outperforming all prior feedforward approaches. The system also excels on shorter sequences, maintaining state-of-the-art reconstruction accuracy while running significantly faster than full-attention baselines. By eliminating the need for post-hoc optimization, LoGeR delivers a fully feedforward solution that works seamlessly from short indoor sequences to kilometer-scale outdoor trajectories.