The Generative AI Fair Use Defense Under Google Books

After the Supreme Court’s decision in AWF v. Goldsmith restored what many of us view as common sense to the fair use doctrine of transformativeness, the flurry of litigation against AI developers will test the same principle in a different light. As discussed on this blog and elsewhere, caselaw has produced two frameworks for considering whether the “purpose and character” of a use is transformative. One focuses on differences in expressive elements, like the use of Goldsmith’s photograph to make Warhol’s silkscreen; and the other considers a use made for a unique purpose, like the millions of scanned books used to produce the Google Books search tool.

In Warhol, the Court affirmed that transformative expression must contain some element of “critical bearing” (i.e., comment) upon the work(s) used, and this concept, tied to the different character of work, is distinguished from the use of copyrightable works to create a tool or product that may be considered transformative because it is novel and beneficial for society. Notwithstanding the possibility that generative AI may prove to be harmful to society, the copyright question of the moment is whether the use of many millions of protected works to “train” these models is transformative under the same reasoning applied in Authors Guild v. Google Books (2015).

Because the Google Books search tool could only be developed by inputting millions of digitized books into the database, the argument being made is that this is obviously analogous to ingesting millions of protected works for AI training. And certainly, no one could doubt that generative AIs are novel, even revolutionary. But this may be where the comparisons end under the fair use factor one, which considers the purpose of a use, inherent to which is a “justification for the taking.”[1]

The factor one decision in Google Books turns substantially on the court’s finding that the search tool provides information about the works used. “…Google’s claim of transformative purpose for copying from the works of others is to provide otherwise unavailable information about the originals,” the opinion states. While Google Books “test[ed] the boundaries of fair use,” the court held that the search tool furthered the interests of copyright law by providing various new ways to research the contents of books that would otherwise be impossible. Although unstated (because it would have been absurd), the recipients of the information provided by Google Books were/are human beings. And especially if some of those human beings use the information obtained to produce and/or engage with expressive works, the finding of fair use fulfills copyright’s constitutional purpose to “promote progress.”

Generative AI developers may try to argue that the use of creative works for training serves an “informational” purpose, but unlike Google Books, the information obtained from the ingested works only “informs” the machine itself. A generative AI does not, for instance, provide the human user with new ways to learn about Renaissance painting (or point to Renaissance works) but instead trains itself how to make images that look like works from the Renaissance.[2] Setting aside the cultural debate about the value of such tools, the purpose of the generative AI is clearly distinguishable from the reasoning applied in Google Books.

As discussed in an earlier post, a consideration of AI under fair use should turn on the question of promoting “authorship,” lest the courts become distracted by the broadly innovative nature of these systems—especially for any purpose outside the scope of copyright.[3] In that post, I argued that generative AIs do not promote “authorship,” and I would die on that hill, if the developers’ expectation is that these tools will autonomously generate “creative” works without any human involvement.

For instance, if “singer/songwriter” Anna Indiana is a primitive example of what’s to come—and my understanding is that this is exactly what the AI models are designed to do—then the “purpose” of these systems is not to promote authorship, but to obliterate authorship by removing humans from the “creative” process. As such, the fair use defense cannot apply because without the element of authorship, the consideration is no longer a copyright matter.

On the other hand, as stated in my comments to the Copyright Office, it is conceivable that a human author might “collaborate” with an AI tool to produce a work that meets the “authorship” threshold. For instance, by using a set of prompts that articulate sufficient creative choices in the production of a visual work (or by uploading one’s own work and using an AI tool to modify it), one can make a reasonable argument that this constitutes “authorship” under copyright law. This is one potential purpose of generative AI, and one which could favor a finding of transformativeness under similar principles articulated in Google Books.

But Google Books did not present the court with so many unknown, relevant questions of fact.

The purpose of the Google Books search tool was clearly defined and fully developed when that case was decided in 2015. By contrast, fair use defenses of AI today are presented on behalf of technologies whose development is nascent and exponentially dynamic. Simply put, we do not know yet whether a particular generative AI will promote authorship or become a substitute for authorship—the former being favorable to a finding of fair use, the latter being fatal to such a finding. Here, proponents may argue that so long as there is a mix of uses, resulting in both authored and un-authored outputs, this is sufficient to find the purpose of a given AI transformative, but it seems likely that the current docket of cases will be decided before enough determinative facts can be known.

For now, it is worth remembering that sweeping statements alleging that generative AI training is “inherently fair use” are anathema to a doctrine that rejects such generalizations. Fair use remains a fact-intensive, case-by-case consideration, and one of the many difficulties with AI is that relevant facts are not only evolving, but they describe technologies unlike anything that has been examined under the fair use doctrine to date.

[1] Citing Campbell, informing both Google Books and Warhol.

[2] I recognize that this is an oversimplification of what the AI can do.

[3] i.e., AI’s potential applications in areas like medicine or security should be dismissed as irrelevant to a fair use consideration of generative AIs that make “creative” works.

Photo by: chepkoelena531

With AI, Big Tech is No Longer Pretending to Care

As reported by Insider last week, the Andreessen Horowitz VC firm a16z, complains that potential copyright liability for AI developers could harm the interest of their investors. “Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development,” they state, as quoted by Kali Hays. Sympathy for the billionaires was not forthcoming, as my friend Neil Turkewitz can attest based on the responses to his tweet on the topic…

More about the VCs’ comments below, but against this backdrop of millions of creators laughing at the raw hubris of Andreessen et al., it is worth watching how, or whether, the AI developers address the matter of indemnifying customers against potential liability for copyright infringement claims arising from use of their systems. Writing for TechCrunch, Kyle Wiggers observes that as these companies respond to investor pressure to attract enterprise customers, copyright infringement indemnity may become common. For now, the landscape reads like a patchwork of promises with a sub-patchwork of disclaimers and conditions.

Adobe, IBM, and Microsoft have made the strongest assurances that they will commit resources to defend customers against copyright infringement claims; other prominent AI models like Stability AI, Midjourney have not yet adopted any such provisions; and Wiggers states that “Google offers some defense for customers against third-party allegations of IP infringement arising from its text- and image-generating models.” In practice, of course, the only real test to determine whether these clauses are meaningful (rather than just PR) is for a rightsholder to file a suit and see what happens. And that gets to the question of which parties are being protected, and why.

In 2015, Google announced it would pay legal fees for YouTubers whose videos were wrongly removed from the platform via the DMCA notice-and-takedown provision. In fact, Google did not mean all YouTubers but a few selected video creators, and I do not believe Google ever had to put its money where its mouth was (not that anything they pledged counted as “money” in their world).* Although indemnity clauses in Terms of Service are a different animal, there is a familiar ring this time in the AI developers’ limitations and restrictions—for instance to only indemnify enterprise customers.

The trend strikes me as maddening. First the AI developer “trains” its model by feeding it millions of creative works, all used without permission from the rightsholders. Next, the AI developer hopes to sell its system to enterprise users—businesses that will, in theory, no longer need to hire the same professional creators whose works were rustled to develop the AI. And finally, the AI developer will protect said business user against potential infringement claims by that same class of professional creators (at least until there are no more creators left). Maybe this isn’t quite how things will go, but in principle, it looks a lot like looting a neighborhood and then erecting legal barriers to prevent the residents from remedying the theft.

And that brings me back to Andreessen Horowitz, and the gall it takes to so frankly dismiss the rights of all creators as an inconvenient barrier to VC wealth. In its comments to the Copyright Office, a16z recited Psalm 1 from the Book of Tech-Bro, demanding our blind faith that what’s good for the tech sector is always good for the country. “[Investor] expectations have been a critical factor in the enormous investment of private capital into US-based AI companies. Undermining those expectations will jeopardize future investment, along with U.S. economic competitiveness and national security.”

After recovering from the spit take at manifesto-writing capitalists seeking federal protection for their private equity investments, the only sensible reply to the overstated reference to national security is BULLSHIT. If the future of U.S. national security depends on developing a for-profit generative AI to make music or paint pictures, we’re screwed. Fortunately, this is not the case. Defense Department AI strategy (good, bad, or otherwise) will proceed independent of AI’s role in creative works of expression. Accordingly, it is both revealing and ridiculous that Andreesen Horowitz would even mention national security in comments to the Copyright Office.

Notably, the quote above appears under a subhead asserting that using protected materials for machine learning is fair use. The paragraphs that follow cite no authority to support a fair use argument and, in fact, undermine that defense by coming very close to asserting that there is no basis for a claim of infringement. If non-infringement is the argument, then fair use should not be raised, and a16z’s failure to articulate a strong position in either direction leads one to reasonably conclude that their only argument is financial self-interest. Last I checked, the free market doesn’t guarantee success, and if your business model is based on a potentially liability, that’s a problem with the model—i.e., a you problem.

With so many billions invested in generative AI, Big Tech’s longstanding clash with copyright law has finally pivoted from a lie about building new opportunities for individual creators to the unblushing truth that it views creators as obsolete relics dragging against their deterministic vision of the future. “Today, companies are aiming to remove artists and writers from the loop entirely — it turns out, even free labor was too expensive,” writes Eryk Salvaggio in a must-read essay. And if that’s how AI investors feel about human beings in the creative arts, we should question their investments in everything.

*UPDATE: Per comment by Neil Turkewitz, Google filed one suit in 2019 against one alleged abuser of the DMCA.

Where Are All the Trolls at the CCB?

A lot of world-shaking events have occurred since 2018, when the CASE Act was introduced for the purpose of creating a small-claim copyright alternative, now known as the Copyright Claims Board (CCB). After a pandemic, an attempted coup d’ etat, and other jaw-dropping moments, it’s easy to forget all the ululating noise produced by the Electronic Frontier Foundation, Fight for the Future, Public Knowledge, Mike Masnick, the Niskanen Center, Sen. Wyden, and Computer & Communications Industry of America, et al. to warn the public about the perils of the CCB. The loudest talking point in that cacophony was the unfounded prediction that the small-claim tribunal would be an ideal forum for copyright trolls. For example…

“The CASE Act would give copyright trolls a faster, cheaper way of coercing Internet users to fork over cash “settlements,” bypassing the safeguards against abuse that federal judges have labored to create.” – EFF, April 2018 –

A “copyright troll” is an attorney who consistently files questionable or unmeritorious claims with the intent to extract settlements from alleged copyright infringers. In response to predictions that the CCB would be a perfect venue for trolling, I and others responded by highlighting the many safeguards in the CASE legislation that were written specifically to anticipate and prevent abuse of the tribunal. In fact, that EFF quote above was a double lie because safeguards against abusive or unmeritorious claims do not easily prevent trolling in federal court, which is why trolling happens in those venues, although not nearly so often as the anti-copyright hecklers like to claim.

CCB Safeguards Triggered for the First Time

As Jonathan Bailey describes in a recent post on his blog Plagiarism Today, the CCB has, for the first time, invoked its authority to bar an attorney from filing small claims for one year. To be clear, based on Bailey’s description, the attorney in question does not deserve the description “troll,” let alone the kind of predatory actor copyright hecklers refer to when they use that term.

Instead, this attorney triggered the safeguard provisions by filing several unmeritorious claims against Amazon, which was improperly named, and foreign resellers, which cannot be named in CCB claims. As Bailey notes, the effort is understandable because, “Many creators have complained that marketplaces like Amazon, Wish, Temu and so forth have become havens for infringement.”

My point here is not to comment upon or critique this one attorney’s intentions or errors, but to emphasize that the sanctions he activated at the CCB are the same safeguards written to prevent copyright trolls from even using the tribunal, let alone abusing it. As noted in this post, the CCB is a cost-prohibitive venue for the would-be troll due to the limited number of claims that may be filed in a single year, the potential fines for intentional abuse, and the possibility of being barred from the CCB for a year.

During the roughly two years between introduction and passage of the CASE Act, a typical response to the statutory safeguards was, “Well, we can’t trust the Copyright Office.” This familiar, dimwitted tactic is indistinguishable from those who say “We can’t trust the DOJ” in response to meritorious indictments against the former president. Meanwhile, the CCB, in demonstrating that it will enforce safeguards as the law requires belies all those scary headlines predicting that sharing memes on social media would result in a tidal wave of $30,000 fines.

The anti-CASE messaging has since evaporated into the digital ether, of course, but at moments like this, I think it’s fair to say that every time these same hecklers predict anything about copyright law, they should be ignored. I don’t mean that their views should be heavily scrutinized. I mean ignored. They lie about basic facts. They use fearmongering as a primary tactic. They claim to represent interests they do not represent. And they battle chimeras to stay relevant and raise funds. On that last point, expect to see the EFF look for an opportunity to litigate the constitutionality of the CCB—an effort that will likely fail but, as I say, will make good material to promote with a “Donate Now” button.

The Illusion of More

Dissecting the digital utopia.

Category: Copyright

The Generative AI Fair Use Defense Under Google Books

But Google Books did not present the court with so many unknown, relevant questions of fact.

With AI, Big Tech is No Longer Pretending to Care

Where Are All the Trolls at the CCB?

CCB Safeguards Triggered for the First Time

Archives

Browse Topics