The Courts Should Embrace the Novelty of Generative AI in Copyright Law

courts

Courts can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create, just because the issue has not come up before. Indeed, it seems likely that market dilution will often cause plaintiffs to decisively win the fourth factor—and thus win the fair use question overall—in cases like this. – Judge Vincent Chhabria, Kadrey et al. v. Meta

In several posts, I have argued that generative AI (GAI) invokes novel copyright considerations on the basis that the technology has the potential to harm authorship itself, even where it may not harm specific works of authorship under traditional fair use analysis. GAI is distinguishable from any technology with which copyright law has had to contend, and if the courts will continue to guide the law to preserve copyright’s foundational principle—the incentive to create—they should recognize and even embrace the invitation to plow some new legal ground.

In the Copyright Office’s third report on artificial intelligence, one section introduces the notion of market dilution, which cites several comments including my own. Naturally, the AI industry rejects the premise that market dilution of all works, or even a certain type of work, is a valid consideration under copyright law. This argument, albeit self-interested, has some merit under traditional fair use analysis. Fair use factor four, which considers whether a specific use potentially threatens the market value of the work(s) in suit may be narrowly construed to reject the kind of generalized market harm implied by GAI.

But as the quote above reveals, Judge Chhabria in Kadrey et al. v. Meta (not even one of the strongest cases against AI developers) recognizes the novelty of this technology to undermine the foundational purpose of copyright law.  He also states, “…by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way.”[1] This recognition of far-reaching harm to the “incentive” foundation for copyright addresses an even broader question than the term “market dilution” implies.

A Broader Fourth Factor Analysis

In the Copyright Office’s report, the section called Market Dilution offers guidance for a reading of fair use factor four that is broad enough to address the fact that GAI outputs can harm the overall market for the same kind of works used in training. Here, I would endorse a view that broadens the fourth factor consideration, which traditionally only looks to potential harm to the copyright owner’s exclusive right to exploit the works in suit.

As argued in other posts, and in my comments to the USCO, the courts should place considerable weight in deciding whether the use at issue furthers the purpose of copyright. My submitted comment the Office chose to highlight states: “[G]enerative AI—if it does not produce market substitutes—primarily represents potential harm to authors and future authorship. . . .[T]he consideration in the context of ‘training’ should be expansive and doctrinal—namely that a potential threat to ‘authorship’ cannot, by definition, ‘promote the progress’ of ‘authorship.’”

I believe that dichotomy, novel to GAI, is precisely what the courts must resolve in order to prevent the technology from swallowing copyright law itself—perhaps especially where a given AI product does not output unlawful copies of works used in training.  The one consideration that rescues GAI products as promoting the purpose of copyright is where they are demonstrably “tools” for creators, but this goes to my overarching argument that the courts likely cannot obtain sufficient facts to discover whether the “tool” is constructive, destructive, or agnostic with regard to copyright’s purpose.

An AI tool used for ideation, for example, may further the purpose of copyright by helping the creator discover a new path from idea to protectable expression, but it is impossible for the court to assume this is the general purpose of the “tool.” The same product might just as easily be used in ways that are destructive to authorship.

For example, the vast majority of material produced and distributed will not be copyrightable due to the human authorship requirement for copyright rights to attach. Additionally, we are already seeing a large volume of AI “slop” distributed on platforms like Amazon and Spotify, and it is well established that driving traffic to garbage content is a profitable model for those willing to engage in the practice. Although a specific bucket of AI “slop,” when considered in a traditional fourth factor analysis, may not directly compete with any specific works of authorship, the courts should continue to give weight to the undeniable fact that a market flooded with “slop” does not in any sense promote copyright’s purpose and is most likely destructive to that purpose.

This view does not ignore or dismiss the creative and cultural potential of GAI as a means of expression. Many popular videos online are made with (presumably) human-authored scripts combined with AI generated AV material. That the expressions in these works will generally be unprotectable is a valid basis on which to find that the purpose of the AI product does not promote copyright’s purpose. But further, the fact that many of the creators of these works are not incentivized by copyright rights—they are motivated by the opportunity to share ad revenue with the platforms—means that these works, regardless of their qualitative value, live outside the copyright system. As such, works incentivized and enabled by a model other than copyright cannot reasonably be held to further the purpose of copyright.

In my view, these considerations look beyond the typical factor four analysis, and even beyond the ordinary concept of market dilution, to ask a fundamental question:  Can a technology built by mass copyright infringement properly make fair use of works when the product’s ultimate purpose is either destructive or irrelevant to the purpose of copyright law? I don’t think so.

Is Denial of Licensing for AI Training a Market Harm?

A recent post by Copyright Alliance CEO Keith Kupferschmid states that both Judge Alsup in Bartz and Judge Chhabria in Kadrey erred by too hastily concluding that authors are not entitled to license fees for the use of their works in AI training. On that assumption, both judges held that under factor four, the claimants could not show market harm due to the defendants’ failure to license. Kupferschmid writes:

Both judges are incorrect because they ignore the important realities that a robust emerging market for licensing of AI training material already exists. Licensing markets under the fourth factor may only be circular and non-cognizable when the market being considered is a potential licensing market and the judge is trying to determine whether that potential market is too speculative. But when there is an actual market that already exists, the circularity argument has no place and both judges were incorrect to summarily claim the argument is circular. 

Notably, Judge Chhabria, in rejecting the existence of a licensing market for AI training, cites Tresona Multimedia v. Burbank High School, but in addition to Kupferschmid’s point that a licensing market already exists for AI training, I am not sure the court’s reference to Tresona even applies. Judge Chhabria quotes from the opinion thus: “In every fair use case, the ‘plaintiff suffers a loss of a potential market if that potential [market] is defined as the theoretical market for licensing’ the use at issue in the case.” However, the next part of the opinion reads as follows:

…a copyright holder cannot prevent others from entering fair use markets merely ‘by developing or licensing a market for parody, news reporting, educational, or other transformative uses of its own creative work.’ (citation omitted)

This appears to tie the question of whether a licensing market is merely “theoretical” to a finding of whether the purpose of the use is indeed transformative. And although both the Kadrey and Bartz courts found those uses to be transformative, I believe those holdings are so tautological (i.e., lacking proper analysis) as to be ripe for significant challenge. Notably, at issue in Tresona was an educational use of small amounts of musical works—a paradigmatic fair use consideration, and one that may be as far from the implications of generative AI as we might imagine. “Further, the Warhol decision calls into question whether fair use cases like Tresona are still good law,” Kupferschmidt said to me by email.

The interplay between factors one and four, while inherent to the fair use analysis, reveals a vexing circularity in the context of GAI where the court is persuaded to find that the remarkable nature of the technology is transformative solely because the use appears to serve a “different purpose” than the works used. In addition to not fully aligning with Warhol, Judge Chhabria’s well-founded instincts about authored works “competing” with voluminous GAI works under factor four cannot be comfortably harmonized with the finding that the AI product serves a different purpose under factor one.  Clearly, if the purpose of the input material is to entertain and inform and the purpose of the “competing” output material is to entertain and inform, these are not different purposes.

The important difference, then, is that the input works are human authored, about which copyright law speaks volumes, while the output works are machine made, about which copyright law says almost nothing. In general, GAI no more adds to the productivity of copyright than the sea steadily eroding stone into an aesthetically pleasing “natural sculpture.” The courts need not attempt to foresee whether GAI will be socially beneficial or harmful but only find that in context to copyright law there are far more reasons to disfavor fair use than to favor it.


[1] I would have preferred that Judge Chhabria had not used “old fashioned,” which may be improperly read to mean “outdated” in contrast to AI generated works.

Shedding Light: Briefs Filed in Kadrey v. Meta

kadrey

The purpose of cultivating works of authorship is to shed light on human experience, and the foundational purpose of the fair use doctrine in copyright law is to shed light on works of authorship. From its 18th century, English roots to the U.S. Supreme Court’s 2023 decision in AWF v. Goldsmith, the primary rationale for fair use is to permit the unlicensed use of works in ways that critique or comment upon the works themselves. Harvesting millions books to train an LLM does not do this.

With the growth of digital technologies and copyright protection for highly utilitarian computer code, fair use doctrine expands somewhat to permit certain “non expressive” uses of works. But these uses allowed by the courts have still tended to provide information about the works used or have been held to advance purposes like software interoperability. Harvesting millions of books to train an LLM does not do this.

A pair of briefs filed in Kadrey v. Meta—one by Association of American Publishers (AAP), the other filed by a group of IP law professors—present compelling arguments against finding that Meta’s unlicensed copying of millions of books to train its generative AI product Llama is fair use. A common theme in both briefs exposes a core fallacy, and legal hypocrisy, common to AI developers in these cases—namely that they copy protected “expression,” but they don’t copy protected “expression.”

As we see in the shorthand of social media, the developers write their own dichotomy by simultaneously humanizing and dehumanizing their products. In one breath, they compare machine leaning (ML) to human learning but then drop the analogy when they seek to claim that the protected “expression” in the works used is not copied or stored by their mysterious and complex “training” models. The AAP brief argues that copying “expression” is central to training an LLM, and the professors’ brief shows why “learning like a human” is precisely why fair use does not exempt Meta from obtaining licenses.

Both AAP and the professors naturally present specific arguments as to why none of the fair use case law supports Meta’s defense, but I was intrigued by the ways in which both briefs argue from different perspectives that training Llama indeed exploits the “expressive content” of the books appropriated. In fact, if it could be shown that no protected expression is copied or stored, this would be an argument that no case for infringement exists. But considering the emphasis on fair use—and all similar cases will almost certainly turn on fair use—we can assume that this statement from AAP is correct:

Meta would have this Court believe that authors’ original expression is not preserved in or exploited by the model. But this is not so. The LLM algorithmically maps and stores authors’ original expression so it can be used to generate output—indeed, that is the very point of the training exercise.

Kadrey and all AI training lawsuits with similar facts presented will turn on fair use factors one and four. Under factor two (nature of the works used), the books in Kadrey, and the works in most other cases, are “expressive” rather than “factual” in nature, and therefore, this factor favors plaintiffs. Under factor three (amount of the work used), it is understood that whole works have been fed into the LLM models, and so, this factor also favors plaintiffs.

Under the first fair use factor (purpose of the use), the court considers 1) whether the use is transformative; and 2) whether the use is commercial. Here, Meta’s commercial purpose is undeniable, and the AAP brief soundly argues that there is nothing transformative about copying the word-for-word expression in textual works for a purpose that sheds no light on the works used. On the contrary, the intent of the LLM is to create a non-human, substitute “author,” a purpose for which there is indeed no judicial precedent.

Factor four considers potential market harm to the copyright owner(s) of the work(s) used, and factor four may be the keystone in the broader creators versus GAI battle. Meta, a trillion-dollar company run by executives whose credibility is in doubt, contends that it is not feasible to license the books they used to train Llama. In response, AAP presents substantial evidence of licensing agreements between copyright owners and several major AI developers, and it states that Meta abandoned negotiations with publishers and chose instead to harvest books from pirate repositories.

Further, AAP argues “from a policy perspective” that Meta’s accessing those pirate “libraries” of DRM-free books militates against finding fair use in contravention of Congress’s intent when it passed the Digital Millennium Copyright Act (DMCA) in 1998. “Congress sought to establish a robust digital marketplace by ensuring appropriate safeguards for works made available online, including copyright owners’ ability to rely on DRM protections in distributing electronic copies of their works.”

In this spirit, inherent to the history of the fair use doctrine is the notion of “fair dealing” or, put differently, general legality in the overall purpose and character of the use. “The compiler of the training data’s knowledge of the unlawful provenance of the source copies might well taint the ‘character’ of the defendant’s use,” writes Professor Jane Ginsburg in a paper examining the question of fair use of works for AI training.[1]

The Professors’ Brief

The brief filed by the IP professors also emphasizes that the protected “expression” in the works is copied and exploited without license, but it also rather deftly uses Meta’s own rhetoric to doom the fair use defense. In general, when the AI cheerleaders say that LLMs “learn the way humans do,” my instinct has been to sneer at this anthropomorphic sentiment. But by giving the “learning like humans” analogy weight, the professors’ brief demonstrates exactly why that claim is fatal to a defense that the developer’s purpose is fair use.

Noting that humans indeed use protected works for “learning” all the time, the professors make plain that this exact relationship between author and reader (the basis for copyright) does not exempt the human from obtaining works legally. Thus, by Meta’s own analogy, the “machines learn like humans” claim is both an affirmation that the “expression” is being exploited and proof that that there is nothing transformative about using works for “learning.”

Further, the professors have a bit of fun emphasizing that Meta et al. strain to make the machine leaning process sound as technically complex as possible to obscure the fact that only by copying “expression” could the LLM actually “learn” anything. Here, a tip of the hat is deserved for the brief’s description of a human being reading a book thus:

… many billions of photons hit the book’s surface; some of those billions reached a lens, which focused them onto a retina, which converted them into electronic signals, which then resulted in electronic and chemical changes in some portion of over 100 billion neurons with over 100 trillion connections, some of those changes being transitory, and others more permanent.

The technical description of human processing and learning is even more mysterious because not even expert specialists in neuroscience know how the brain works at the neuronal level.

Well done! If that needlessly technical description of human reading requires legal access to the book, then so does the far less complex process of machine learning for AI development. Moreover, even if Meta were the vanguard developer and there were no examples of licensing deals being made, there is no rationale anywhere in commerce that a necessary resource must be free because it is essential. Meta et al. need electricity, engineers, and probably a computer or two to develop Lllama, and not one of these resources is free. Yet, somehow the most essential resource—the work of millions of authors—should be free.

On that note, there has never been a more important time to protect the rights and economic value of authors who shed light on the world we inhabit. I remain more than skeptical that it will ever be desirable to create literary works without authors, musical works without composers, etc. And certainly, licensing deals alone do not address all the potential hazards of unethical or questionable uses of generative AI. How products like Llama are used will provoke discussions that are cultural as well as legal. But for the moment, fair training of all AI models is the only rule that is both ethical and consistent with copyright’s purpose.


[1] Prof. Ginsburg is not one of the professors in the brief cited for this post.

Photo by Busko