HeadlinesBriefing favicon HeadlinesBriefing.com

PostgreSQL Memory Overcommit: Avoiding OOM Killer Catastrophes

Hacker News •
×

For 15 years, operators of managed PostgreSQL services have relied on strict memory overcommit to prevent catastrophic OOM (out of memory) kills. This configuration stops the Linux kernel from allocating more virtual memory than physical RAM is available, forcing early allocation failures instead of late, destructive process termination by the OOM killer.

PostgreSQL's architecture, with its shared memory segments, makes it particularly vulnerable. An OOM kill on a backend process can corrupt shared memory, forcing the postmaster to crash the entire database cluster for recovery. This translates to significant downtime. Strict overcommit, by refusing memory allocation upfront, turns these potential disasters into manageable client-side errors.

A recent kernel bug, however, led to inflated `Committed_AS` values, even on systems with ample free memory. This phantom memory usage triggered false positives, causing PostgreSQL to receive ENOMEM errors and abort transactions, despite no actual memory pressure. The team identified this bug after observing a 651 GB committed memory figure on an 8 GB machine.

This situation highlights the delicate balance in memory management. While strict overcommit protects PostgreSQL's integrity, kernel misbehavior can cripple database operations. The team ultimately disabled strict overcommit temporarily to investigate and resolve the kernel issue, underscoring the need for precise kernel accounting.