HeadlinesBriefing favicon HeadlinesBriefing.com

PostgreSQL as Dead Letter Queue for Event Systems

Hacker News: Front Page •
×

At Wayfair, engineers built a system where Kafka consumers enriched events and stored them in CloudSQL PostgreSQL. When APIs failed or events were malformed, they needed a reliable way to handle errors without blocking the pipeline. Instead of using Kafka as a dead letter queue, they found it offered poor visibility for debugging and retries.

They implemented the Dead Letter Queue directly in PostgreSQL. Failed events were inserted into a dedicated table with a `PENDING` status, raw payload, and error details. This approach made failures queryable with standard SQL, allowing engineers to inspect issues, track retry counts, and reprocess specific events without complex custom consumers.

A ShedLock-backed scheduler retried `PENDING` events every six hours, using `FOR UPDATE SKIP LOCKED` to prevent duplicate processing across instances. This design tolerated long downstream outages while avoiding retry storms. Failures became a predictable, observable part of the system rather than a disruptive surprise.

The key insight was letting each system do what it does best: Kafka for high-throughput ingestion, PostgreSQL for durable storage and observability. This made failure handling boring and reliable—exactly what production systems need.