What to know

  • Authors accuse Apple of using a pirated dataset to train AI models like Apple Intelligence.
  • Anthropic settled a similar class-action suit, paying $1.5 billion to authors after using unlicensed books for AI.
  • The settlement sets a new precedent in AI copyright law and could influence future licensing and data-use norms.
  • This follows a broader wave of lawsuits targeting OpenAI, Meta, Microsoft, Midjourney, and others over AI training on copyrighted content.

A legal battle is brewing in the AI world—and it’s very real. At the center of it all is Apple, now accused by two authors of training its AI models on pirated books. This controversy arrives on the heels of a huge settlement between Anthropic and a half-million writers. The race is on to figure out what this means for the future of AI—and creative rights.

Apple sued for allegedly training Apple Intelligence on pirated books

Grady Hendrix and Jennifer Roberson have launched a lawsuit accusing Apple of using their copyrighted books without permission to train its AI, specifically tied to its Apple Intelligence features. The filing, made in federal court in Northern California, seeks class-action status and alleges Apple drew from “shadow libraries” such as the Books3 dataset (with a massive collection of over 196,000 pirated e-books) —without consent, compensation, or credit.

Hendrix, known for horror novels, and Roberson, a fantasy writer, assert their books appeared in this trove without permission. Apple allegedly fed this data into models like OpenELM, which powers features in Apple Intelligence across devices. The suit seeks damages potentially reaching $2.5 billion, arguing the company profited from stolen content while bypassing licensing fees.

Plaintiffs further argue Apple licensed content selectively—like a multimillion-dollar Shutterstock deal—while neglecting to license authors who were also contributing to its success. Apple has yet to publicly respond.

This matter marks Apple's entry into a growing legal storm over intellectual property in the era of generative AI.

The notorious Books3 dataset exposed, yet again!

At the heart of it all sits Books3, a controversial repository assembled by researchers but widely criticized for including pirated materials. This dataset, part of the larger Pile collection, contains everything from bestsellers to niche titles, all digitized without authors' consent. Critics label it a "pirate's treasure chest" for AI training, enabling models to learn language patterns from vast texts. Apple's involvement marks a fresh twist, as the company previously emphasized ethical AI practices, yet the suit alleges it knowingly used tainted sources to accelerate development.

Anthropic’s landmark $1.5 billion settlement

Apple’s legal headache isn’t unique. Just days before the lawsuit dropped, AI upstart Anthropic publicly agreed to pay $1.5 billion to settle similar piracy complaints. In this class action, about 500,000 authors are set to receive $3,000 for each book used without consent, marking the largest known copyright settlement in AI history. Anthropic also committed to destroy any pirated book files it downloaded for training.

What makes the Anthropic settlement stand out isn’t just the dollar amount—it’s the precedent. The case shows that courts and creators are increasingly willing to challenge how AI companies source their training data, with big consequences for future startups.

This year alone, major AI firms, including OpenAI, Meta, Microsoft, and even art-generating tools like Midjourney, faced waves of lawsuits. The New York Times, comic book authors, and Hollywood studios have all accused AI developers of scraping their work to train models and generate outputs that could compete with originals or dilute market value.

Legal arguments often hinge on whether training an AI with unlicensed data constitutes fair use or plain old theft. While some judges found limited fair use by AI companies, landmark cases like Thomson Reuters v. Ross Intelligence and ongoing artist lawsuits continue to stress the question: Can machines learn from protected content without explicit permission?

Why these cases matter for creators, companies, and AI

Creative professionals worry such practices strip away the market—and control—they deserve for their work. For AI startups and tech giants, these lawsuits force a reckoning with how data is collected and used. Settlements like Anthropic’s, along with Apple’s new legal battle, could redefine how content is licensed and how responsibility is enforced in the next wave of machine learning.

The implications reach far beyond authors: musicians, artists, photographers, and newsrooms all stand to gain—or lose—depending on how courts rule.