AI, Copyright, and the Courts: <em> Bartz v. Anthropic </em>
How the court clarified fair-use boundaries and accelerated the rise of licensing markets for AI training data.
Is training a large language model on copyrighted material protected fair use or blatant infringement? According to the Court in Bartz v. Anthropic, the answer is: “it depends.” 787 F. Supp. 3d 1007, 1019 (N.D. Cal. 2025). Fair use is a legal doctrine that allows for the limited use of copyrighted works, without permission, for the purposes of criticism and commentary, news reporting, teaching, scholarship, and research. Id. at 1019. Fair use may apply when copyrighted material is sufficiently transformed. Id. That is to say, the material is used “in a manner or for a purpose that differs from the original use in such a way that the expression, meaning, or message is essentially new.” Black’s Law Dictionary (12th ed. 2024), Westlaw.
In late 2023, a class of authors claimed that Anthropic, a San Francisco-based artificial intelligence company, copied their books to train its large language model, Claude, without their permission. Id. at 1014–18. Claude is a generative AI system that produces natural language text in response to both technical and informal user queries. Id. at 1014. Claude learns by analyzing large amounts of written material, including novels, news articles, academic papers, websites, and conversational text held in a central library of texts collected by Anthropic engineers. Id. The plaintiff-authors alleged that the training set included their books and the books of many other authors. Id. According to the plaintiffs, Anthropic obtained many of these books through illegal online sources such as Books3, Library Genesis ("LibGen"), and Pirate Library Mirror ("PiLiMi"), websites that host pirated books and academic materials available for free download. Id. at 1015. The court found that Anthropic's library consisted of both legally purchased and illegally pirated texts from the plaintiffs and thousands of other authors. Id.
The eventual class action centered on one fundamental question: Can an AI developer use copyrighted materials, including purchased books, as training data without the author's permission? Anthropic argued that training a model on copyrighted text is a transformative use because the books themselves were not reproduced in outputs for Claude users. Id. at 1021. Instead, the books were merely raw data that helped the algorithm learn about the world and how to respond to users. Id. This, Anthropic claimed, is similar to how humans read books to learn how to write. Id.
The court was convinced by the transformative use defense stating, however, that it only applied when the copyrighted material was lawfully acquired. Id. at 1033. The court found that using purchased books for model training was "exceedingly transformative" because the purpose of the use was entirely different from the original purpose of the books. Id. at 1019, 1033. However, the court drew a firm line regarding provenance, the origin or source of the materials. Id. at 1026. If Anthropic obtained copyrighted works from pirated databases, those copies were "irredeemably infringing." Id. Even if the use was otherwise lawful, the acquisition was not. Id. Anthropic, therefore, must establish clean provenance. Id. Importantly, the court held that even internal research datasets must have documented provenance. Regina Sam Penti, Anthropic’s Landmark Copyright Settlement: Implications for AI Developers and Enterprise Users, Ropes & Gray (Feb. 5, 2025), https://www.ropesgray.com/en/insights/alerts/2025/09/anthropics-landmark-copyright-settlement-implications-for-ai-developers-and-enterprise-users. The Court created one of the clearest judicial rules in the emerging field of AI and copyright: the fair use doctrine does not apply to books that were illegally acquired. Id. This ruling fundamentally changes how AI developers can train their LLM models.
The Bartz court also considered whether Anthropic had displaced a potential market for licensing copyrighted material. Id. at 1032. Although the Court agreed with the Plaintiffs that there may be a market for licensing books protected by copyright, the existence of such a market was not a material factor in its fair use calculation. Id. This was likely a relief to Anthropic who argued that such a market would be expensive and disadvantageous for AI developers. Id. Although the potential licensing market harm did not shift the fair use analysis, it underscored the scale of the commercial interests at stake, especially with the development of the class action. Cade Metz, Anthropic Agrees to Pay $1.5 Billion to Settle Lawsuit With Book Authors, N.Y. Times (Feb. 5, 2025), https://www.nytimes.com/2025/09/05/technology/anthropic-settlement-copyright-ai.html. Because the case involved a class of authors, each copyrighted work used without permission could expose Anthropic to statutory damages of up to $150,000. Id. With approximately 7 million pirated books in its training data, and the pressure of prolonged litigation, Anthropic chose to negotiate a comprehensive resolution rather than risk an unpredictable trial. Id.
In August 2025, Anthropic proposed a $1.5 billion settlement covering claims related to unauthorized training data. Id. The settlement included a $3,000 payout per copyrighted work for approximately 500,000 works, data remediation and certification of destruction of any pirated materials (covering past uses up to August 2025), and court oversight, due to the court's concerns about fairness and transparency in the settlement process. Id.; Cade Metz, Anthropic Agrees to Pay $1.5 Billion to Settle Lawsuit With Book Authors, N.Y. Times (Feb. 5, 2025), https://www.nytimes.com/2025/09/05/technology/anthropic-settlement-copyright-ai.html; Regina Sam Penti, Anthropic’s Landmark Copyright Settlement: Implications for AI Developers and Enterprise Users, Ropes & Gray (Feb. 5, 2025), https://www.ropesgray.com/en/insights/alerts/2025/09/anthropics-landmark-copyright-settlement-implications-for-ai-developers-and-enterprise-users. The court granted preliminary approval in September 2025. The ruling confirms that AI training can qualify as fair use while signaling that courts are willing to impose real consequences for use of illicitly acquired materials. Id.
Moving forward, it is important to build a licensing framework for AI training data. Authors, publishers, and AI companies may benefit from licensing systems similar to those used in the music industry by performing rights organizations like the American Society of Composers, Authors, and Publishers ("ASCAP") or Broadcast Music, Inc. ("BMI"). See ASCAP, [https://www.ascap.com/] (last visited Feb. 13, 2026); Broadcast Music, Inc., [https://www.bmi.com] (last visited Feb. 13, 2026). These organizations license music for public use and distribute royalties to artists. ASCAP, [https://www.ascap.com/] (last visited Feb. 13, 2026); Broadcast Music, Inc., [https://www.bmi.com] (last visited Feb. 13, 2026). A similar collective-licensing model for books could allow AI developers to pay reasonable fees while ensuring authors are compensated without negotiating individual agreements. Hannah Choi, The AI Training Data Watershed: Why the $1.5 Billion Anthropic Settlement Changes Everything (Nov. 12, 2025), https://ipwatchdog.com/2025/10/02/ai-training-data-watershed-1-5-billion-anthropic-settlement. Industry groups will need to work together to create shared standards such as consistent metadata, clear pricing, and easy-to-use licensing platforms, to make this market functional. Hannah Choi, The AI Training Data Watershed: Why the $1.5 Billion Anthropic Settlement Changes Everything (Nov. 12, 2025), https://ipwatchdog.com/2025/10/02/ai-training-data-watershed-1-5-billion-anthropic-settlement. Developing these standards early would give AI developers more certainty and help ensure that authors are paid when their work is used in training datasets. Regina Sam Penti, Anthropic’s Landmark Copyright Settlement: Implications for AI Developers and Enterprise Users, Ropes & Gray (Feb. 5, 2025), https://www.ropesgray.com/en/insights/alerts/2025/09/anthropics-landmark-copyright-settlement-implications-for-ai-developers-and-enterprise-users.
Innovation without permission is infringement. Bartz v. Anthropic PBC shows us that the AI industry's next chapter demands licensed training data, compensated authors, and copyright law that protects both human creativity and technological progress.