HeadlinesBriefing favicon HeadlinesBriefing.com

GitHub Tool Turns PDFs Into Realistic Scans with WASM

Hacker News •
×

GitHub project make-look-scanned turns clean PDFs into images that mimic handheld scans. The command‑line tool reads an input file, rasterizes each page, applies noise, skew, JPEG compression, and other visual degradations, then writes a fresh image‑only PDF. The process removes selectable text, matching the look of a basic scanner.

Building the binary needs Go and a C toolchain because go-fitz links the Mu PDF library via cgo, producing a self‑contained executable. Users invoke it with flags such as --noise, --skew, --jpeg-quality, or preset bundles defined in a config.toml. Deterministic output is achieved by hashing the input content for a seed, unless overridden.

The same effect pipeline runs in browsers through WebAssembly. Since Mu PDF cannot compile to wasm, the project falls back to PDF.js for rasterization, passing pixel data to Go code compiled to wasm. A single HTML file bundles the wasm binary, Go runtime, and PDF.js, allowing offline use without a server.

Both builds are released under AGPL‑3.0. The CLI's static Mu PDF linkage triggers the copyleft requirement, mandating source disclosure when distributing binaries. The browser build, however, uses PDF.js under Apache‑2.0, avoiding that obligation. Developers can exploit the tool for generating realistic scanned PDFs in automated testing or data augmentation pipelines.