Finding Fair Use for GAI Training is Highly Problematic

fair use

Although I have expressed aspects of these views in several posts over the past couple of years, I will try to consolidate my opinion as to why GAI training with protected creative works is a more problematic fair use consideration than many, even the courts, seem to believe. I acknowledge that even fellow copyright advocates will disagree with some of this analysis, but here it goes:

For the sake of narrowing the focus to the question of whether training generative AI (GAI) with protected works favors a fair use exception, the following assumes that the training requires unlicensed copying of protected expression. Further, even if the GAI maker limits the product’s capacity to output infringing copies, this does not alter the fact that considering fair use for this purpose is, at best, troubling and, at worst, so disturbing to case law that the AI developers are begging the courts to articulate doctrine out of whole cloth.

A GAI’s Purpose is Not Analogous to Past Fair Use Factor One Findings

The courts have largely rejected the overbroad opinion that making “something new” is a sufficient justification for unlicensed use of protected works. Thus, it is difficult to see where any court finds an authority to support the argument that making a “creator robot,” however revolutionary its developers proclaim it to be, is a transformative purpose under a factor one analysis.

Typically, a GAI’s purpose neither expresses “critical bearing” on the works used (AWF v. Warhol) nor provides information about the works to human readers (Authors Guild v. Google) nor fosters interoperability in computer devices (Google v. Oracle). Instead, a GAI’s most widely applied and widely promoted purpose is artificial “authorship” without authors—a purpose which forecasts myriad negative effects that may prove to dramatically overwhelm any benefits promised by the developers.

Naturally, certain GAIs (e.g., ChatGPT) can be used for various purposes, about which more below, but if the courts are distracted by the sheer novelty, scope, and hype around the “importance” of GAI and, therefore, presume transformativeness, they may be persuaded to articulate a rationale that would be tantamount to a blanket exception for GAI training. If the court adopts this carve-out in the context of fair use factor one, the result would be a reversal of its own reluctance to favor the broad “something new” argument for transformativeness so recently rejected in Warhol.

Notably, it is not unprecedented for the court to articulate rationales beyond the four-factor analysis. In the Google Books case, the court found that the search tool provides a “social benefit,” and a similar sentiment was articulated in Google v. Oracle regarding consumer benefit in advancing mobile products. Or looking back at the Betamax VCR case, the concept of “time shifting” the viewing schedule served the public interest by expanding flexibility in the consumption of copyrighted material that was lawfully obtained.

But if the courts look for a rationale beyond the case law (e.g., a clear social benefit of GAI), not only will they be making a wild guess, but any conclusion in favor of the developers will probably be wrong—perhaps dangerously so. While it is understandable that the courts may be reluctant to hobble technological development in principle, the available facts militate against disturbing fair use jurisprudence for the sake of nurturing GAI in general.

Put differently, if the courts are going to take a wait-and-see approach, there is ample evidence that GAIs already cause harm to individuals—from CSAM and defamation to cheating and psychological issues—to say nothing of the well-founded anxieties—social, political, economic, and environmental—associated with this multi-trillion-dollar gamble being played by the same people who unrepentantly accrued wealth and power from the darkest results of Web 2.0.

GAI as a Tool for Creators

To the extent that a given GAI product may be considered a tool for producing creative works, a fair use holding should at least find that the tool “promotes the progress” of authorship with respect to copyright’s purpose. But this is difficult because the same GAI in the hands of one skilled creator offers little insight about its ultimate purpose in the hands of 100-million unskilled users.

At the positive end of considering GAI’s purpose, my friend David Bolinsky, a medical illustrator and animator, recently made a series of 8 dozen topically and stylistically distinct ten second animations, introducing speakers and segment topics for a scientific conference, a daunting assignment. GAI collapsed well over a year of work (if using his standard 3D animation tools) into a matter of weeks. He was surprised at the breadth and depth of creative latitude GAI enabled. Further, he explained that although these presentations allowed more creativity than his typical discrete medical and scientific educational animations, an amateur lacking his experience still could not have used the same GAI tools to achieve the same results. Consequently, Bolinsky sees GAI as an opportunity to do more and different kinds of work and not as a threat to his creativity or livelihood.

In this example, the technology is socially beneficial and arguably “promoting the progress” of authorship, which may favor a finding that the tool is transformative. That said, due to the human authorship requirement, we are years away from guidance as to the degree of copyright protection on those animations; and if GAI tools are used to produce millions of works that have no “authors” as a matter of law, it is contrary to find that this “promotes progress” in regard to copyright’s purpose.

Further, the difficulty for the court in considering fair use is that Bolinsky and his colleagues who specialize in medical work are unique among professional creators, to say nothing of the many millions of non-creator customers that GAI developers need—because they are leveraged into the stratosphere—to make their products profitable. This scale implies an analysis reminiscent of Sony—i.e., a question of whether the purpose of the GAI is substantially beneficial or substantially harmful. But knowing that requires time travel.

If a court could see a few years into the future and find, for instance, that the GAI at issue will be used substantially for nonconsensual pornography, disinformation, and scams, it would presumably decline to find these purposes are social benefits that favor an expansive transformativeness finding. Instead, at the moment, the courts simply have no idea what the true “purposes” are of various GAIs, which is unprecedented in fair use jurisprudence. The VTR, Google Books, Android phones, et al. did not serve materially different purposes years after they were presented to the courts in their respective cases. By contrast, GAIs present an incomplete and dynamic set of facts; and in my view, this alone should militate against finding that factor one favors any of these products.

The Threat to Authorship Itself

As stated in other posts and in comments to the Copyright Office, one unique challenge of GAI is that it poses a potential threat to authorship (i.e., that it will shrink the number of creative workers), which is clearly destructive to the progress clause and copyright law. Although my own view is that a party who poses an existential threat to copyright’s purpose should not be allowed to invoke one of copyright law’s affirmative defenses, I recognize the difficulty in that opinion.

Under U.S. law, copyright protects authors indirectly by protecting certain exclusive rights to use their works. Consequently, there is little foundation for arguing generalized harm to authorship itself, despite the overwhelming recognition that diversity in authorship has benefitted the United States both culturally and economically for almost two centuries. In this context, GAI provokes the question as to whether U.S. policy might shift toward a “moral rights” approach akin to Europe, but that’s a discussion for a different post.

Instead, the general threat to authorship is considered, to an extent, under fair use factor four, which weighs the potential threat to the market value of the works used. The key difficulty, however, is that if the GAI does not output the song “Ordinary” but instead outputs music in the style of Alex Warren, then the output is not, strictly speaking, a threat to the market value of “Ordinary” itself. While proposals like the NO FAKES Act would prohibit unauthorized replication of Warren’s voice, copyright law does not clearly prevent a GAI that makes Warren-like music that could theoretically obviate the need for Warren himself.[1]

For now, several plaintiffs in the roughly 40 active lawsuits GAI developers have presented evidence of outputs that are substantially similar to the works used in training, and this should disfavor fair use for the GAI developers under factor four. More broadly, plaintiffs in these cases argue that licensing works for the purpose of AI training is itself a market opportunity exclusive to the copyright owner, and therefore, the failure to license constitutes market harm under factor four.

Some courts may be reluctant to agree with the lost licensing opportunity claim, but that reluctance is unfounded—even if a developer successfully prevents its product from outputting copies of works used in training. So long as one of the exclusive copyright rights is implicated (and here, it would be the reproduction right), then a requirement to license exists. Consequently, failure to license, especially at such an extraordinary scale for unprecedented commercial venture, is unquestionably market harm to the copyright owner.

Even where there may be a close call on factor four, because the GAI developer should lose on factor one, and because factors two and three decidedly favor creator plaintiffs, factor four should not reasonably control in many of these cases. Moreover, the courts should pay scant attention to the claim by developers that the cost of licensing is existentially prohibitive to the development of GAI. In addition to the fact that this plea is barely tolerable from parties wildly spending billions on high-risk ventures, any claim that a license is “too costly” for any venture is no defense under copyright law. The copyright owner sets the terms for the use of her work, and the prospective user can accept those terms or not before using the work. If that rule applies to the bootstrapping indie filmmaker, surely it applies to Microsoft, Meta, Google, et al.

Conclusion

Fair use is a mixed question of fact and law, and I maintain that what should be most fatal to the developers’ fair use defense is that, like the public, the courts have insufficient facts about the ultimate purpose of GAI products. Just as with Web 2.0 in the late 1990s, we are witnessing unfounded political sentiment to once again let Big Tech do what it wants, preaching to the public that this time, the technology really will “solve the world’s problems.”

Of course, there is no rational basis for that belief beyond the self-interest of the developers and the investors losing billions every year. If past is prologue, Congress would live to regret the folly of allowing AI to run amok, just as Members of both parties now rue the unconditioned immunity of Section 230. In the meantime, while licensing copyrighted works for GAI training will not address all, or most, of the potential hazards of artificial intelligence, the courts should decline to adopt strained fair use rationales in the name of assumed progress that may turn out to be a complete disaster.


[1] I believe there are cultural reasons that militate against this result, but those predictions do not influence the fair use consideration.

Questions But Not Chaos at the Copyright Office

copyright office

I have not commented on developments since May 13 because in this instance, caution is more important than keeping up with every rumor, of which there are plenty. I stand by my general views articulated in that last post but am not quite ready to agree with Digital Music News reporting on May 23 that the Copyright Office has “plunged into total chaos.” In fact, it is both premature and self-defeating for creators to go there.

What is certain is that the administration’s unprecedented attempts to appoint the acting heads of both the Library of Congress and the Copyright Office invite statutory and constitutional conflict. These were presented in the lawsuit filed by Shira Perlmutter over what she argues was her unlawful and ineffective dismissal by the White House on May 10 from her position as Register of Copyrights and Director of the U.S. Copyright Office. As the complaint describes, the President’s concurrent and unilateral naming of DOJ attorney Todd Blanche as acting Librarian triggers a cascade of questions that are both legally uncertain and politically fraught.

That the President may dismiss the Librarian of Congress is well founded, but the process of installing a lawful acting Librarian pending a new nominee is another matter. In essence, Perlmutter’s argument rests on the foundation that the Library of Congress is not an executive agency as a matter of statutory or constitutional law. Under Title 2, the Librarian is nominated by the President and confirmed by the Senate, but “The Library of Congress is, in name and function, Congress’s Library,” Perlmutter’s complaint states.

Perlmutter enumerates both statutory and case law examples to support her claim that because the Library is not an executive agency, the President had no authority to name Blanche as acting Librarian under any provision that might be construed to give him that power. On that basis, because the Register of Copyrights is undeniably an appointee of the Librarian under Title 17, Perlmutter argues that the absence of a lawfully appointed acting Librarian nullifies both her dismissal and the attempted appointment of DOJ Associate Deputy AG Paul Perkins as acting Register. Further, as a constitutional matter, Perlmutter alleges that the President has attempted to arrogate to himself powers that rest solely with Congress.

While Perlmutter’s allegations read to this layman as compelling, I do not have sufficient knowledge about administrative law, let alone the relevant case law, to anticipate the counterarguments to her claims. On May 28, the DC district court denied Perlmutter’s request for a temporary restraining order (TRO) that would have reinstated her as Register pending the court addressing the merits of her claim. Notably, in denying the TRO, colleagues who attended the hearing say the court focused on the fact that the Office, not Perlmutter, would suffer the harm. The court also opined that it was compelling Congress did not intervene and noted that the Library of Congress is a “unicorn” that serves both legislative and executive functions.

And therein lies the rub—a kerfuffle that is legally uncertain but also ripe for substantial political haggling because not even Republicans on the Hill want the White House mucking about in the Library of Congress. Specifically, the Congressional Research Service (CRS) is a non-partisan agency that provides confidential reports to Members, and nobody in Congress wants that agency to be directed by whichever party is in the White House.

Meanwhile, all speculation as to the role of Big Tech and the timing of the Office’s third report on copyright and artificial intelligence is just that. The fact that an early draft of the report was made public one day before Perlmutter’s “dismissal” supports the theory that tech interests sought to quash or amend the conclusions of the report through its influence with Trump. Other reports implied that “tech” was synonymous with Elon Musk as the driving force and that his “abolish all IP” view ran afoul of right-wing media’s interest in its copyright-protected material. And, of course, that was before this past week’s fireworks between Musk and Trump.

Pick your favorite narrative, and it’s probably mostly wrong. But as a practical matter, I do think it is premature and unhelpful to say that the Copyright Office is in a state of utter chaos while both the legal and political difficulties triggered by the White House are addressed. Registration applications are still being processed, though it is safe to assume that the Office has paused at least some of its work as a consulting agency, including the anticipated fourth report on AI.

With the Library, it certainly appears that Trump may have stepped in a pile of WTF on the Hill because of the CRS. With the Copyright Office, creators should want a restoration of the normal, non-partisan function of the agency, maintaining the registration process and advising Congress, the courts, and the public on copyright law and policy. For now, I wouldn’t panic just yet.

Is Congress Prepared to Scuttle Good State Laws for AI Developers

state laws

A fight is underway in Congress over an amendment to the “big beautiful” budget reconciliation bill that would put a 10-year moratorium on state laws governing certain uses of artificial intelligence. The amendment, proposed by Republicans and opposed by Democrats on the House Energy and Commerce Committee, is broad and concerning to multiple stakeholders, including 36 State Attorneys General who signed a letter addressed to the House. It states, “The impact of such a broad moratorium would be sweeping and wholly destructive of reasonable state efforts to prevent known harms associated with AI.”

The language, which passed out of committee last week, states:

(c) MORATORIUM.—

(1) IN GENERAL.—Except as provided in paragraph (2), no State or political subdivision thereof may enforce any law or regulation regulating artificial intelligence models, artificial intelligence systems, or automated decision systems during the 10-year period beginning on the date of the enactment of this Act.

According to Tech Policy Press, the idea for a legislative “pause” to allow AI development room to “innovate” began with a 2024 blog post by R Street’s Adam Thierer. “With over 700 federal and state AI legislative proposals threatening to drown AI innovators in a tsunami of red tape, Congress should consider adopting a ‘learning period’ moratorium that would limit burdensome new federal AI mandates as well as the looming patchwork of inconsistent state and local laws,” Thierer wrote.

Putting a pin in my cynicism about “learning periods” granted to Big Tech, the fact is that on cyber policy, Republicans and Democrats have been united (at least in multiple hearings) on the theme that tech platforms have already acted irresponsibly with their unregulated market when it comes to mitigating child suicide, drug trafficking, non-consensual pornography, threats to lawful commerce, and other matters. Further, several states have already passed, or are proposing, laws aimed at specific harms, all of which are either directly or indirectly facilitated with AI technology.

For example, the Texas Senate recently and unanimously passed a bill designed to “Stop AI Generated Child Pornography,” and it is tough to imagine why Texas Representatives or Senators would pass legislation that would preempt their own state’s right and rationale to mitigate this egregious crime. Some may argue that the moratorium will not preempt the Texas law, or similar laws, but I think it is a safe bet that such laws would be ripe for a preemption challenge.

Perhaps no party will litigate to defend child pornography, but what about the rights of musical performers? In March of last year, music-rich Tennessee passed the ELVIS Act to prohibit the AI replication of voices without permission of the individual. The act further prohibits making available an algorithm, software, tool, et al. with the primary purpose or function of producing an unauthorized “likeness.” Given the interests of AI developers in various uses of likeness replication, Tennessee’s ELVIS Act would seem ideal for a preemption challenge, if Congress were to pass the moratorium. Indeed Tennessee Senator Blackburn, recently pushed back on the moratorium proposal, citing the ELVIS Act as a “first generation of the NO FAKES” bill that was reintroduced in Congress in April.  

In California, the State Assembly Judiciary Committee recently passed AB-412, which would require AI developers to (upon request) provide information as to whether a rightsholder’s protected and registered works were used in model training. This provision, essentially requiring that a product maker take responsibility for materials in its supply chain, would almost certainly fail a preemption challenge under the moratorium.

Ten Years is Forever in Tech Time

Returning to the cynicism I set aside, lawmakers on both sides of the aisle already know what 10+ years of letting Big Tech do what it wants looks like. Americans have already “learned” that lesson, and I have lost count of how many times Republicans and Democrats have disparaged the unconditioned immunity of Section 230 and the industry’s callous disregard for the various harms it causes.

Yes, we are going to continue to debate and fight like hell over the bugaboo of misinformation, but in the meantime, Republicans cannot reasonably want to oppose state laws designed to protect their citizens from direct physical, emotional, and/or economic harm. We’ve been there and done that to death. Congress should not be persuaded to let Big Tech play in the lab for another decade just to see what happens.

Below, is a list of laws enacted or proposed in several states, and Congress should take particular note of legislation designed to protect both children and adults from sexual abuse with generative AI.


Image by Wrightstudio