HeadlinesBriefing favicon HeadlinesBriefing.com

Self-host Reddit Archive: 2.38B Posts

Hacker News: Front Page •
×

A new tool called Redd-Archiver lets you download and browse Reddit's entire history offline, sidestepping the platform's recent API restrictions. It transforms the massive 3.28TB Pushshift torrent into static HTML files, creating a fully browsable archive on your own hardware. This means you can access 2.38 billion posts without an internet connection or API keys, offering a permanent solution for data hoarders and researchers.

Reddit has systematically closed off third-party access and threatened the Pushshift dataset, making external archiving difficult. Redd-Archiver solves this by processing compressed data dumps locally. It generates a JavaScript-free, mobile-friendly interface. For those needing more power, a Docker deployment adds PostgreSQL-backed full-text search. The tool also includes a Model Context Protocol server, allowing AI assistants to query your personal archive directly.

Deployment is flexible, running on anything from a USB stick to a home server or a VPS. You can even host a Tor hidden service with minimal configuration. The creator built it using Python and PostgreSQL, leveraging AI assistance for development. This project represents a grassroots effort to preserve internet history against corporate data lockdowns, ensuring valuable discussions remain accessible forever.