HeadlinesBriefing favicon HeadlinesBriefing.com

Oxide’s Rack Design Reveals Hidden Processor Bug

Hacker News •
×

Oxide Computer Company’s new rack design places a Service Processor behind a single management‑network interface. Engineers expect to touch a rack only for hardware failures. Early trials with the Cosmo sled revealed the processor vanished from the network, leaving a silent, overheated system that could not report status when the host CPU remained powered.

Debugging began by inspecting the host CPU, fan rates, and network counters. The CPU stayed alive, fans ran at full speed, yet the processor sent no traffic. Suspecting task starvation in the custom OS Hubris, engineers added longer restart delays and toggled the chassis LED to trace progress without network visibility for the engineers today.

Investigators then turned to the new FPGA‑controlled Flexible Memory Controller. A timing glitch could stall the CPU when reading registers, mimicking a dead processor. By inserting a deliberate bus hang and observing the same symptoms, they confirmed the FPGA bus as the culprit, narrowing the search to timing constraints and cache behavior in the system.

After integrating a vector‑catch reset, engineers could halt the STM32H7 Cortex‑M7 long enough to capture memory dumps, revealing that internal cache writes triggered unexpected bus accesses. Fixing the timing violation and disabling the stray cache flushes restored network visibility. The rack now reports health reliably, meeting its low‑maintenance promise for data center operators who expect remote management without physical intervention and ensured reliability.