HeadlinesBriefing favicon HeadlinesBriefing.com

Streambed Streams PostgreSQL to Iceberg Tables on S3 Without ETL

Hacker News •
×

Streambed is a CDC engine that streams PostgreSQL data to Apache Iceberg tables on S3 without requiring ETL pipelines or Spark clusters. The tool captures WAL changes through logical replication and converts them into queryable Parquet files, allowing analytical workloads to run independently from production databases.

The architecture decodes inserts, updates, and deletes from PostgreSQL's write-ahead log, buffering rows per table before flushing to S3 as Parquet files with Iceberg metadata commits. Updates and deletes use copy-on-write merging against existing data. A built-in query server exposes Iceberg tables through the Postgres wire protocol using embedded DuckDB, enabling connections via psql or any Postgres client.

Users start synchronization with `streambed sync`, specifying source and S3 configuration. The tool supports backfill operations via `streambed resync` using consistent snapshots, plus standalone query servers and cleanup utilities. Configuration works through command flags or environment variables with the STREAMBED_ prefix.

Development requires Go 1.22+ with CGO support for go-duckdb and go-sqlite3 dependencies. This approach eliminates traditional data pipeline complexity while maintaining compatibility with existing PostgreSQL tooling and workflows.