Allegro

What the courts are now saying about A.I. ingesting copyrighted art

Volume 125, No. 8September, 2025

Harvey S. Mars, Esq.

This month’s A.I. column is by Local 802 In-House Counsel Harvey S. Mars.

Two recent court cases appear to deal blows to creative artists in favor of generative A.I. ‘s uninhibited use of copyrighted material. These decisions were nuanced and the cases were impeded by the way artists articulated their claims. Both cases might have resulted in favorable outcomes had the plaintiffs pursued different legal claims. Prior to reviewing these decisions some background is necessary.

In my June 2025 A.I. article, I commented that if utilization of copyrighted material to create generative A.I. systems is deemed a fair use, the ingested material will lose its copyright protection, and its unlicensed use will be legally permitted. Such a result would have a devastating impact upon creative professionals.

As noted previously, fair use analysis involves consideration of four factors that are codified in Section 107 of the Copyright Act of 1976. These factors are:

The purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes.
The nature of the copyrighted work.
The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
The effect of the use upon the potential market for –, or value of — the copyrighted work.

The fair use factors require an individual consideration specific to the facts of the case. I surmised that the particular factor that may be determinative of the analysis is the fourth one in the list above: the effect of the use upon the potential market for — or value of — the copyrighted work. I still believe this to be the case.

Two recent federal court decisions issued within days of each other in the U.S. District Court of the Northern District of California have undertaken fair use analysis with regard to training generative A.I. large language models (LLMs) that can ingest copyrighted creative works and output “new” works. Both cases determined — for different reasons — that use of copyrighted materials to create LLMs constituted fair use.

CASE #1: BARTZ VS. ANTHROPIC

The first decision, Bartz vs. Anthropic PBC (June 23, 2025), represents the first substantive decision on how fair use applies to generative artificial intelligence. The defendant, Anthropic, is the corporate owner of the A.I. called Claude. Many of you may be familiar with Claude; it’s a kind of competitor to ChatGPT. Here’s how the court described the case in question:

An artificial intelligence firm downloaded for free millions of copyrighted books in digital form from pirate sites on the internet. The firm also purchased copyrighted books (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned every page, and stored them in digitized, searchable files. All the foregoing was done to amass a central library of “all the books in the world” to retain “forever.” From this central library, the AI firm selected various sets and subsets of digitized books to train various large language models under development to power its AI services. Some of these books were written by plaintiff authors, who now sue for copyright infringement. On summary judgment, the issue is the extent to which any of the uses of the works in question qualify as “fair uses” under Section 107 of the Copyright Act.

The court granted judgment in favor of Anthropic on its fair use argument related to digitizing books and training Claude. Hidden in this highly technical decision was this chilling paragraph:

…Authors [the plaintiffs] contend generically that training LLMs will result in an explosion of works competing with their works — such as by creating alternative summaries of factual events, alternative examples of compelling writing about fictional events, and so on. This order assumes that is so (Opp. 22–23 (citing, e.g., Opp. Exh. 38)). But Authors’ complaint is no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works. This is not the kind of competitive or creative displacement that concerns the Copyright Act. The Act seeks to advance original works of authorship, not to protect authors against competition.

In other words, Claude is merely “competition,” not a thief who steals copyrighted material from human creators.

The actual decision is nuanced. The “fair use” analysis in this case was limited to the books used to train Claude rather than the new material generated by this system.

The Court found that the defendant’s use of plaintiff’s books to train Claude was a transformative use of already existing material. In this regard the court held that Anthropic’s format-change from print library copies to searchable digital library was transformative under fair use analysis and no copyright violation existed. Nonetheless, Anthropic was denied summary judgment on plaintiffs’ piracy claim, with the court explaining that Anthropic is not justified in its argument that pirating initial copies of plaintiffs’ books was at least reasonably necessary for training LLMs. Apparently Anthropic acquired some of the ingested books by downloading them through so-called shadow libraries including a notorious Web site called LibGen, and that this constituted piracy. The pirated copies were not demonstrated to have been used in the LLM training process. If the case goes to trial, the court will decide damages for Anthropic’s “central library” of “pirated copies.”

Recently, a preliminary class action settlement agreement has been reached between the parties on the piracy claim. The settlement is expected to be finalized on September 3, 2025. If it is, Anthropic could avoid court imposed penalties of up to a trillion dollars. However, Anthropic is still facing legal challenges from major recording labels such as Universal Music Group, who allege that the company illegally trained its GAI programs on copyrighted lyrics

I think it important to observe that the outcome of this decision might have been very different had the plaintiffs also protested the new material generated by Claude. If that were the case the court could have considered the fourth factor, the impact of the generated material on existing copyrighted material owned by the plaintiff. In this regard the court noted that “[h]ere, if the outputs seen by users had been infringing, the [plaintiffs] would have a different case… Instead, [plaintiffs] challenge only the inputs, not the outputs, of these LLMs.”

You have to analyze this case carefully to understand what’s truly going on. Imagine you asked Claude to generate a short story for you that was based on what it had ingested. If the authors in this case sued about Claude’s short story (what the court called the “outputs”), they might have prevailed in some regard. Instead, they sued over the fact that Claude was fed copyrighted works (what the court called the “inputs”), as if it were just reading them. The court found that this was a fair use.

CASE #2: KADREY VS. META

The second decision, Kadrey vs. Meta Platforms Inc., (June 25, 2025) the court also considered copyright violations that might exist with regard to Meta’s LLM model called LLaMA, which it trained with books owned by the plaintiffs in that case. Here the court held that ingestion of the plaintiffs material was also fair use, but for a different reason than Anthropic.

In Meta, the plaintiffs unfortunately failed to produce any evidence that there would be any market dilution as a result of output created by LLaMA. In fact, had there been any evidence of market harm the court indicated that it would not have ruled in favor of Meta.

The court stated that “[n]o matter how transformative LLM training may be, it’s hard to imagine that it can be fair to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the mark for those books… Because the issue of market dilution is so important in this context, had the plaintiffs presented any evidence that a jury could use to find in their favor on the issue, factor four would have needed to go to a jury.”

Thus, neither of these decisions considered the possible market dilution impact of A.I. generated material with respect to their fair use analysis. For this reason, these decisions should not be seen as dispositive of this issue. The conclusion reached by former Register of Copyrights Shira Perlmutter in the May 2025 report on generative A.I. training may still hold true. That conclusion was:

“The copying involved in A.I. training threatens significant potential harm to the market for or value of copyrighted works. Where a model can produce substantially similar outputs that directly substitute for work in the training data, it can lead to lost sales. Even where a model’s outputs are not substantially similar to any specific copyrighted work, they can dilute the market for works similar to those found in its training data, including by generating material stylistically similar to those works…Making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond the boundaries of fair use.”

Hopefully, the next court that decides this issue will issue a decision that also reaches this conclusion.

If you are interested in the Local 802 A.I. committee, please send an e-mail to Local 802 In-House Counsel Harvey Mars at hmars@local802afm.org and A.I. Committee Chair Jerome Harris at jeromeharr@aol.com. Send feedback on Local 802’s A.I. series to Allegro@Local802afm.org.