HeadlinesBriefing favicon HeadlinesBriefing.com

340+ Local News Sites Block Internet Archive Over AI Scraping Fears

Hacker News •
×

342 local news sites across the US now block the Internet Archive's web crawlers from archiving their content, according to a new Nieman Lab analysis. McClatchy, Advance Local, Tribune Publishing, and USA Today Co. account for most of the restrictions. Publishers worry AI companies might scrape archived journalism for training data, even though none have confirmed this has happened.

The restrictions accelerated since January 2026 when Nieman Lab first reported on the issue. Advance Local, which includes The Cleveland Plain Dealer and The Oregonian, began hard-blocking the Archive in August 2025 without evidence of scraping. About 80% of the sites disallowing the Archive initially belonged to USA Today Co., underscoring how concentrated the trend is among major chains.

Working journalists and historians depend on Wayback Machine archives to trace local news trends. B.J. Mendelson, editor of The Monroe Gazette, wrote in a petition that archival access is "incredibly difficult" without the Internet Archive. The Wayback Machine says it only permits collections for scholarship or research, but publishers like Alden Global Capital subsidiaries remain unconvinced.