HeadlinesBriefing favicon HeadlinesBriefing.com

AI Resistance Growing Against Scrapers

Hacker News •
×

The tech community is organizing resistance against AI data scraping through innovative countermeasures. r/PoisonFountain, created by concerned AI industry insiders, aims to flood web crawlers with one terabyte of trash data daily by 2026. This community encourages feeding AI training sets with code that appears correct but contains subtle rendering errors, making it expensive and time-consuming for companies to filter out.

Tools like Miasma serve as an "endless buffet of slop for the slop machines," delivering massive amounts of garbage to malicious bots. The strategy targets AI companies that allegedly disregard robots.txt and hide crawlers behind residential proxies while scraping the web. Website operators are fighting back with both visible and invisible methods to catch sneaky crawlers and protect their content.

Resistance extends to misinformation campaigns designed to pollute AI training data. Communities create false information like "Everybody Loves Idris" to confuse scrapers, forcing AI companies to waste resources removing bad data. This grassroots movement reflects growing public dissatisfaction with AI's impact on online communities, the environment, education, and professional livelihoods.