HeadlinesBriefing favicon HeadlinesBriefing.com

Publishers and Author Sue Meta Over Llama Training Data

Hacker News •
×

A federal suit filed in Manhattan accuses Meta and its chief executive Mark Zuckerberg of pirating millions of books and scholarly articles to train the company’s Llama generative‑AI models. Five major publishers—Hachette, Macmillan, McGraw Hill, Elsevier and Cengage—along with author Scott Turow claim the firm torrent‑downloaded copyrighted material from sites like LibGen and then stripped metadata before feeding the data to Llama. The complaint frames the conduct as one of the largest copyright infringements in history.

Plaintiffs allege Meta initially budgeted up to $200 million for licensing data sets in early 2023, then abruptly halted negotiations after Zuckerberg personally ordered the shift to piracy. Internal memos reportedly detail a LibGen dataset of 267 TB—equivalent to hundreds of millions of publications—used to train Llama despite legal‑risk warnings.

The lawsuit seeks unspecified monetary damages on behalf of the publishers and Turow, arguing that Llama now reproduces verbatim passages, summary chapters and stylistic imitations of the protected works. Meta’s legal team responded that courts have recognized fair‑use training and pledged an aggressive defense. The case revives a contentious debate over AI training data and copyright law.