Shedding Light: Briefs Filed in Kadrey v. Meta

kadrey

The purpose of cultivating works of authorship is to shed light on human experience, and the foundational purpose of the fair use doctrine in copyright law is to shed light on works of authorship. From its 18th century, English roots to the U.S. Supreme Court’s 2023 decision in AWF v. Goldsmith, the primary rationale for fair use is to permit the unlicensed use of works in ways that critique or comment upon the works themselves. Harvesting millions books to train an LLM does not do this.

With the growth of digital technologies and copyright protection for highly utilitarian computer code, fair use doctrine expands somewhat to permit certain “non expressive” uses of works. But these uses allowed by the courts have still tended to provide information about the works used or have been held to advance purposes like software interoperability. Harvesting millions of books to train an LLM does not do this.

A pair of briefs filed in Kadrey v. Meta—one by Association of American Publishers (AAP), the other filed by a group of IP law professors—present compelling arguments against finding that Meta’s unlicensed copying of millions of books to train its generative AI product Llama is fair use. A common theme in both briefs exposes a core fallacy, and legal hypocrisy, common to AI developers in these cases—namely that they copy protected “expression,” but they don’t copy protected “expression.”

As we see in the shorthand of social media, the developers write their own dichotomy by simultaneously humanizing and dehumanizing their products. In one breath, they compare machine leaning (ML) to human learning but then drop the analogy when they seek to claim that the protected “expression” in the works used is not copied or stored by their mysterious and complex “training” models. The AAP brief argues that copying “expression” is central to training an LLM, and the professors’ brief shows why “learning like a human” is precisely why fair use does not exempt Meta from obtaining licenses.

Both AAP and the professors naturally present specific arguments as to why none of the fair use case law supports Meta’s defense, but I was intrigued by the ways in which both briefs argue from different perspectives that training Llama indeed exploits the “expressive content” of the books appropriated. In fact, if it could be shown that no protected expression is copied or stored, this would be an argument that no case for infringement exists. But considering the emphasis on fair use—and all similar cases will almost certainly turn on fair use—we can assume that this statement from AAP is correct:

Meta would have this Court believe that authors’ original expression is not preserved in or exploited by the model. But this is not so. The LLM algorithmically maps and stores authors’ original expression so it can be used to generate output—indeed, that is the very point of the training exercise.

Kadrey and all AI training lawsuits with similar facts presented will turn on fair use factors one and four. Under factor two (nature of the works used), the books in Kadrey, and the works in most other cases, are “expressive” rather than “factual” in nature, and therefore, this factor favors plaintiffs. Under factor three (amount of the work used), it is understood that whole works have been fed into the LLM models, and so, this factor also favors plaintiffs.

Under the first fair use factor (purpose of the use), the court considers 1) whether the use is transformative; and 2) whether the use is commercial. Here, Meta’s commercial purpose is undeniable, and the AAP brief soundly argues that there is nothing transformative about copying the word-for-word expression in textual works for a purpose that sheds no light on the works used. On the contrary, the intent of the LLM is to create a non-human, substitute “author,” a purpose for which there is indeed no judicial precedent.

Factor four considers potential market harm to the copyright owner(s) of the work(s) used, and factor four may be the keystone in the broader creators versus GAI battle. Meta, a trillion-dollar company run by executives whose credibility is in doubt, contends that it is not feasible to license the books they used to train Llama. In response, AAP presents substantial evidence of licensing agreements between copyright owners and several major AI developers, and it states that Meta abandoned negotiations with publishers and chose instead to harvest books from pirate repositories.

Further, AAP argues “from a policy perspective” that Meta’s accessing those pirate “libraries” of DRM-free books militates against finding fair use in contravention of Congress’s intent when it passed the Digital Millennium Copyright Act (DMCA) in 1998. “Congress sought to establish a robust digital marketplace by ensuring appropriate safeguards for works made available online, including copyright owners’ ability to rely on DRM protections in distributing electronic copies of their works.”

In this spirit, inherent to the history of the fair use doctrine is the notion of “fair dealing” or, put differently, general legality in the overall purpose and character of the use. “The compiler of the training data’s knowledge of the unlawful provenance of the source copies might well taint the ‘character’ of the defendant’s use,” writes Professor Jane Ginsburg in a paper examining the question of fair use of works for AI training.[1]

The Professors’ Brief

The brief filed by the IP professors also emphasizes that the protected “expression” in the works is copied and exploited without license, but it also rather deftly uses Meta’s own rhetoric to doom the fair use defense. In general, when the AI cheerleaders say that LLMs “learn the way humans do,” my instinct has been to sneer at this anthropomorphic sentiment. But by giving the “learning like humans” analogy weight, the professors’ brief demonstrates exactly why that claim is fatal to a defense that the developer’s purpose is fair use.

Noting that humans indeed use protected works for “learning” all the time, the professors make plain that this exact relationship between author and reader (the basis for copyright) does not exempt the human from obtaining works legally. Thus, by Meta’s own analogy, the “machines learn like humans” claim is both an affirmation that the “expression” is being exploited and proof that that there is nothing transformative about using works for “learning.”

Further, the professors have a bit of fun emphasizing that Meta et al. strain to make the machine leaning process sound as technically complex as possible to obscure the fact that only by copying “expression” could the LLM actually “learn” anything. Here, a tip of the hat is deserved for the brief’s description of a human being reading a book thus:

… many billions of photons hit the book’s surface; some of those billions reached a lens, which focused them onto a retina, which converted them into electronic signals, which then resulted in electronic and chemical changes in some portion of over 100 billion neurons with over 100 trillion connections, some of those changes being transitory, and others more permanent.

The technical description of human processing and learning is even more mysterious because not even expert specialists in neuroscience know how the brain works at the neuronal level.

Well done! If that needlessly technical description of human reading requires legal access to the book, then so does the far less complex process of machine learning for AI development. Moreover, even if Meta were the vanguard developer and there were no examples of licensing deals being made, there is no rationale anywhere in commerce that a necessary resource must be free because it is essential. Meta et al. need electricity, engineers, and probably a computer or two to develop Lllama, and not one of these resources is free. Yet, somehow the most essential resource—the work of millions of authors—should be free.

On that note, there has never been a more important time to protect the rights and economic value of authors who shed light on the world we inhabit. I remain more than skeptical that it will ever be desirable to create literary works without authors, musical works without composers, etc. And certainly, licensing deals alone do not address all the potential hazards of unethical or questionable uses of generative AI. How products like Llama are used will provoke discussions that are cultural as well as legal. But for the moment, fair training of all AI models is the only rule that is both ethical and consistent with copyright’s purpose.


[1] Prof. Ginsburg is not one of the professors in the brief cited for this post.

Photo by Busko

Copyright and AI in a World of Whiplash Public Policy

copyright

I have not added a copyright post here since March 19, when the DC Circuit Court of Appeals affirmed in Thaler v. Perlmutter that works produced autonomously by generative AI (GAI) are not protected under U.S. copyright law. Although it is good to see the human authorship doctrine in copyright left undisturbed, it is a fleeting moment of sanity within a warped national reality.

As reported earlier, Open AI appealed to the administration’s focus on China as a basis to argue that “beating China” requires ignoring the copyright claims of authors whose works are used to train AI models. Not only is that claim wrong on it’s face, but the conduct of the current administration vis-à-vis civil rights forces millions of Americans to ask whether China is an adversary or a role model.

One mirror in the funhouse reveals a compelling bipartisan hearing held by the Senate Judiciary Committee, Subcommittee on Crime and Counterterrorism, where Chairman Hawley and colleagues from both parties offered strong endorsements for the courageous testimony of Facebook whistleblower Sarah Wynn-Williams. Focused primarily on Meta’s engagements with the Chinese Communist Party (CCP)—and Zuckerberg’s lying to Congress about that very issue—the committee cited other abuses described in Wynn-Williams’s book, like the company intentionally targeting vulnerable teens. (More about the book Careless People in another post.)

Ordinarily, I compartmentalize copyright matters from other criticisms of Big Tech, but here, the stories overlap, even if Meta is the only target of the committee’s investigation at this time. First, throughout her testimony, Wynn-Williams repeats the theme that Meta used the “but China will win” argument to oppose Congress taking any meaningful regulatory action. This alone should cast doubt upon Open AI et al. making the same argument as a rationale for mass copyright infringement for model training. As Senator Klobuchar noted, there was no basis for prior claims that enforcing various consumer safeguards (e.g., Kids Online Safety Act) would be counter-productive to national security, and in that light, Congress should decline to believe the same story in regard to copyright infringement.

Meta may be unique—or uniquely situated—as a clandestine partner to the CCP, but it is also notable that the committee mentioned the role of Meta’s Llama AI and heard Wynn-Williams’s testimony that the product was used by the CCP for “AI weapons” and for the development of the Chinese LLM DeepSeek. Further, Wynn-Wiliams offers a theory about the open source versus closed model AI competition in the marketplace. “There’s a lot of money on the line,” she says. “In some ways you could say, if you want open source to prevail, it helps to have a strong threat from a Chinese model so you can say that it’s really important that America wins, and we’re the American open-source option. And I think you can see the way that strategically plays out.”

“But China will win” is pretty much what Open AI told the Office of Science and Technology Policy in its letter arguing that machine training with copyrighted works is per se fair use. But looking at Meta (which is currently being sued in the Kadrey case), consider the perspective:  in developing Llama, not only did Meta scrape the literary works of millions of authors and journalists, and not only did it source pirate libraries for that purpose, but it also deployed that same AI power in the interests of a nation that brutally kills freedom of expression. Yes, of course, I’m thinking the same thing because it’s unavoidable. The current U.S. administration has engaged in multiple First Amendment and other constitutional violations, including assaults on the free press, and thus, the policy whiplash.

Couple these optics with the volume of evidence that the real power behind the destruction of the administrative state is a small group of tech billionaires pushing an anti-democracy ideology called the neo-reactionary movement (NRx), and the idea of advocating creators’ rights seems all but futile. After all, is it remotely sane to think that an administration of semi-literate, 1A-infringing, book banners will care about the rights of authors—let alone reject the tech-bros who wrote the destruction manual for the United States?

Setting aside the copyright questions raised by GAI training, Big Tech’s wanton harvest of artistic and intellectual works as lifeless raw material is perhaps the ultimate expression of the cyberlibertarian’s disdain for human beings as mere repositories of data to be exploited and manipulated. The rhetoric of Big Tech ideology—from 4Chan to the halls of academia—is the authoritarian principle that individuals must be sacrificed for the sake of the collective. All rights are a nuisance to the tech oligarch, and authors are the last people any authoritarian wants to empower.

Open AI’s claim that mass copyright infringement is necessary to “beat China” is paradoxical—either willfully or naively blind to the fact that when we treat works of authorship as mere fodder for the machine, we don’t beat the CCP; we emulate it. Further, not only is the claim overstated that GAI development is a matter of national security, but again, what does “national security” even mean at present? Concepts like American interests, values, innovation, global security, etc. are all diminished, if not wholly swallowed, by the reckless destruction of the principles and institutions that distinguish America as a leader among democratic nations. And copyright rights are in those same crosshairs.

In response to copyright’s critics, especially those in academia with Big Tech funding their work, I have argued that the diversity and scope of America’s creative output has been essential to its strength as a democracy. Whether one looks at the economic value of the core copyright industries, the cultural value of diverse creative expression, or both, the rationale for intellectual property is to incentivize useful innovation and legitimate greatness.

American authors—from historians to rockstars—are the legacy of an aspiration expressed by Noah Webster, the father of American English and of American copyright. In 1783, advocating the first state copyright law in Connecticut, Webster argued that “America must be as independent in literature as she is in politics—as famous for arts as for arms.” By contrast the “greatness” proclaimed by Trump is tautological and brittle just like Big Tech’s claims to “innovation” are often vague and misleading.

As proposed in my book, the inclusion of copyright in Article I was one of the more egalitarian and democratic choices made by the founders, even if they did not wholly grasp its potential. At the most basic level, copyright incentivizes creative expression by any citizen anywhere, and the American model largely fulfilled that traditional Republican principle that the market, not the government, decides what is successful.

The copyright questions presented in roughly 40 cases are difficult and novel. Moreover, the facts presented vary, and thus, the outcomes will vary, especially on questions of fair use. In the meantime, it is clear that at least some of the major AI developers are engaged in a campaign to appeal to the current administration to treat copyright rights much as it is treating other constitutional rights—as principles to trample in a march toward something very un-American.

DC Circuit Affirms Human Authorship Required for Copyright

human

In a decision that is unsurprising but important, the DC Circuit Court of Appeals affirmed that “authors,” as defined in U.S. Copyright Act, are human beings and not machines that can autonomously generate works. I say unsurprising because nothing in history or statute should have led the court to any other conclusion, and indeed the opinion can be summed up thus: “…the text of multiple provisions of the statute indicates that authors must be humans, not machines.”

Dr. Thaler, a computer scientist, developed a generative AI (GAI) he calls Creativity Machine, which autonomously generated a visual work for which he applied for a claim of copyright with the U.S. Copyright Office. Thaler disclosed that the work was wholly created by the machine, and on the basis that copyright can only attach to works made by humans, the Office rejected the application. Thaler sued, arguing that the Office was asserting a policy not found in the statute or the constitutional foundation for copyright. He lost in the district court, and the appellate court has now affirmed that ruling. (See earlier posts.)

Specifically, the court cites several operative provisions of the Copyright Act that would be nonsensical if machines were “authors.” “Machines do not have property, traditional human lifespans, family members, domiciles, nationalities, mentes reae, or signatures,” the opinion states. This summary refers to the right to own any kind of property, duration of copyrights, inheritance of copyrights, jurisdictional enforcement of copyrights, incentive to create works, and the right and authority to transfer copyrights.

None of those rights or capabilities apply to non-humans, and non-humans do not have standing in court to adjudicate conflicts over such matters. Consequently, U.S. copyright law would unravel if machines were “authors,” which would, notably, moot Dr. Thaler’s claim that his GAI called Creativity Machine is legally the “author” of the visual work he sought to protect. “Numerous Copyright Act provisions both identify authors as human beings and define ‘machines’ as tools used by humans in the creative process rather than as creators themselves,” the opinion states. Imagine the opposite conclusion and Creativity Machine could be named as a plaintiff in an infringement suit. Chaos ensues, and not just for copyright.

As to Dr. Thaler’s theory that under the work made for hire (WMFH) doctrine, he could claim copyright in the work generated by the AI he owns, the court is clear that this misreads the principle. In plain terms, under WMFH, rights transferred to the hiring party must exist in the first place, but those rights can only be vested in a human being upon creation/fixation of a work. No human author means there are no rights to transfer to a hiring party.

Although the Thaler decision is not surprising, it is important because it reaffirms a core doctrine as both case law and policy evolve in response to GAI. By affirming the boundary that 100% machine-generated expression is not protected, this solidifies the framework in which courts to do what they often do in copyright cases—namely to separate protected expression from unprotected elements in a given work.

The more compelling and trickier question as to what is protected and not protected when an “author” uses a generative “machine” as a tool is now active in the District Court for the District of Colorado. As discussed in this post, artist Jason Allen presents a plausible argument that he used Midjourney as a tool to create and fix his mental conception of a visual work of expression. Arguably, Allen v. Perlmutter will be the first case to write early guidance for the use of GAI to create works that may be protected. As such, that outcome just might be surprising and important.


Photo by: Designer491