The Generative AI Fair Use Defense Under Google Books

After the Supreme Court’s decision in AWF v. Goldsmith restored what many of us view as common sense to the fair use doctrine of transformativeness, the flurry of litigation against AI developers will test the same principle in a different light. As discussed on this blog and elsewhere, caselaw has produced two frameworks for considering whether the “purpose and character” of a use is transformative. One focuses on differences in expressive elements, like the use of Goldsmith’s photograph to make Warhol’s silkscreen; and the other considers a use made for a unique purpose, like the millions of scanned books used to produce the Google Books search tool.

In Warhol, the Court affirmed that transformative expression must contain some element of “critical bearing” (i.e., comment) upon the work(s) used, and this concept, tied to the different character of work, is distinguished from the use of copyrightable works to create a tool or product that may be considered transformative because it is novel and beneficial for society. Notwithstanding the possibility that generative AI may prove to be harmful to society, the copyright question of the moment is whether the use of many millions of protected works to “train” these models is transformative under the same reasoning applied in Authors Guild v. Google Books (2015).

Because the Google Books search tool could only be developed by inputting millions of digitized books into the database, the argument being made is that this is obviously analogous to ingesting millions of protected works for AI training. And certainly, no one could doubt that generative AIs are novel, even revolutionary. But this may be where the comparisons end under the fair use factor one, which considers the purpose of a use, inherent to which is a “justification for the taking.”[1]

The factor one decision in Google Books turns substantially on the court’s finding that the search tool provides information about the works used. “…Google’s claim of transformative purpose for copying from the works of others is to provide otherwise unavailable information about the originals,” the opinion states. While Google Books “test[ed] the boundaries of fair use,” the court held that the search tool furthered the interests of copyright law by providing various new ways to research the contents of books that would otherwise be impossible. Although unstated (because it would have been absurd), the recipients of the information provided by Google Books were/are human beings. And especially if some of those human beings use the information obtained to produce and/or engage with expressive works, the finding of fair use fulfills copyright’s constitutional purpose to “promote progress.”

Generative AI developers may try to argue that the use of creative works for training serves an “informational” purpose, but unlike Google Books, the information obtained from the ingested works only “informs” the machine itself. A generative AI does not, for instance, provide the human user with new ways to learn about Renaissance painting (or point to Renaissance works) but instead trains itself how to make images that look like works from the Renaissance.[2] Setting aside the cultural debate about the value of such tools, the purpose of the generative AI is clearly distinguishable from the reasoning applied in Google Books.

As discussed in an earlier post, a consideration of AI under fair use should turn on the question of promoting “authorship,” lest the courts become distracted by the broadly innovative nature of these systems—especially for any purpose outside the scope of copyright.[3] In that post, I argued that generative AIs do not promote “authorship,” and I would die on that hill, if the developers’ expectation is that these tools will autonomously generate “creative” works without any human involvement.

For instance, if “singer/songwriter” Anna Indiana is a primitive example of what’s to come—and my understanding is that this is exactly what the AI models are designed to do—then the “purpose” of these systems is not to promote authorship, but to obliterate authorship by removing humans from the “creative” process. As such, the fair use defense cannot apply because without the element of authorship, the consideration is no longer a copyright matter.

On the other hand, as stated in my comments to the Copyright Office, it is conceivable that a human author might “collaborate” with an AI tool to produce a work that meets the “authorship” threshold. For instance, by using a set of prompts that articulate sufficient creative choices in the production of a visual work (or by uploading one’s own work and using an AI tool to modify it), one can make a reasonable argument that this constitutes “authorship” under copyright law. This is one potential purpose of generative AI, and one which could favor a finding of transformativeness under similar principles articulated in Google Books.

But Google Books did not present the court with so many unknown, relevant questions of fact.

The purpose of the Google Books search tool was clearly defined and fully developed when that case was decided in 2015. By contrast, fair use defenses of AI today are presented on behalf of technologies whose development is nascent and exponentially dynamic. Simply put, we do not know yet whether a particular generative AI will promote authorship or become a substitute for authorship—the former being favorable to a finding of fair use, the latter being fatal to such a finding. Here, proponents may argue that so long as there is a mix of uses, resulting in both authored and un-authored outputs, this is sufficient to find the purpose of a given AI transformative, but it seems likely that the current docket of cases will be decided before enough determinative facts can be known.

For now, it is worth remembering that sweeping statements alleging that generative AI training is “inherently fair use” are anathema to a doctrine that rejects such generalizations. Fair use remains a fact-intensive, case-by-case consideration, and one of the many difficulties with AI is that relevant facts are not only evolving, but they describe technologies unlike anything that has been examined under the fair use doctrine to date.


[1] Citing Campbell, informing both Google Books and Warhol.

[2] I recognize that this is an oversimplification of what the AI can do.

[3] i.e., AI’s potential applications in areas like medicine or security should be dismissed as irrelevant to a fair use consideration of generative AIs that make “creative” works.

Photo by: chepkoelena531

Google Books & The Semantic Maze of Fair Use

Photo by author.

This week the Supreme Court declined to consider the Authors Guild v Google case, which lets stand the Second Circuit Court ruling that Google’s use of scanned published works for its search tool Google Books constitutes a fair use.  Various pundits and advocates have hailed this as a victory for the fair use principle.  In fact, I saw a headline the other day on Facebook that began with the words “Fair Use Wins …”, and although the decision is unquestionably a win for Google, the fair use principle actually remains mired in a semantic confusion about which the high court might have at least provided some clarity.  It’s all about the word transformativeness.

The fair use doctrine was added to the Copyright Law as part of the 1976 Act, and its original intent was to protect various types of expressions—commentary, parody, education, artistic remixes, reportage, etc.—that by necessity made limited and conditional uses of copyrighted works.  I’ve written longer posts about fair use doctrine in general, and won’t repeat all that here, but readers will remember that there are four interrelated factors to be considered* in assessing whether a use constitutes a fair use.  But in 1994, in a landmark case that was heard by the Supreme Court called Campbell v Acuff-Rose Music, the fair use doctrine grew a new appendage called “transformativeness” that has, in the age of the internet, not only become something of a fifth factor that seems to override consideration of the other four, but also has not been clearly defined as a term of art in legal practice.

As I continue to learn from my attorney friends, some of the words we use in everyday language become terms of art in the legal world, which generally means that court rulings have shaped, narrowed, or expanded the dictionary definition of key terms.  For instance, based on the current ruling by a federal court, the word articles can only mean “physical objects” with regard to the International Trade Commission’s authority to prohibit the importation of illegal goods.  So, if Congress wants to grant that body the authority to restrict the importation of digital data for illegal purposes, they’re probably going to have to rewrite the law.  (More about that another time, perhaps.)

The concept of “transformativeness” in fair use parlance was introduced by Judge Pierre Leval in his paper “Toward a Fair Use Standard” published in the Harvard Law Review in 1990, and coincidentally it was Leval who wrote the decision in the Second Circuit’s ruling in Authors Guild v Google.  But even though the “father of transformativeness” himself has ruled in this case, there is still much confusion about the term and what it means when considering fair use. As Thomas Sydnor of the Center for Internet, Communications and Technology Policy at the American Enterprise Institute writes about the situation:

“As cases applying this judge-made “transformativeness”-based approach to fair use accumulate, that term becomes increasingly incoherent, inconsistent, and counterintuitive. Collectively, its incoherence(s) now threaten to turn what was once a productively flexible multi-factor balancing test into little more than a perfunctory recitation of factors ending in judicial ipsa dixit – “because I said so.” Under such circumstances, rule of law cannot persist.”

Sydnor further points out that the word transform already exists in the 1976 Copyright Act in reference to the preparation of “derivative works,” which is another term of art to describe works such as spin-offs or adaptations into other media. These rights belong exclusively to the copyright owner of the original work and should not be confused with the more casual way we might use the word derivative to describe, or even criticize, a work that is mimicking some other work.  For instance, the above-mentioned Campbell case involves a work of parody that we might describe in common language as derivative, but not so in the context of copyright law.

Campbell v Acuff-Rose Music involved a new, expressive work, specifically 2 Live Crew’s raunchy parody of the song “Oh, Pretty Woman” co-written and originally performed by Roy Orbison.  The court held in Campbell that “the more transformative the new work, the less will be the significance of other factors.”  In this case, the court is referring to the extent to which 2 Live Crew “transformed” the original song to make a new song.  By contrast, though, Google does not “transform” any of the original works to create new expressions but instead uses the contents of the works to create a new search service called Google Books.

So, with these two rulings, we are looking at two significantly distinct definitions of the word transformativeness.  The first refers to modification of an expressive work in order to make a new expressive work.  The second implicitly refers to transformation of the external world (society) by the introduction of some new capacity (i.e. function) it did not have before.  This is particularly relevant because the language used by SCOTUS, asserting that “transformativeness” should “lessen the significance of the other factors,” can only rationally be applied—if the spirit of fair use doctrine is to be kept intact—to the first definition in which an original work is “transformed” to create a new, expressive work.  In the second usage of the word, in which the external world is assumed to be transformed by some new functional use, then “transformativeness” becomes too heavily weighted against the other factors, thus giving (for instance) a giant, wealthy service provider extraordinary latitude to define just about anything it does as socially “transformative.”

If the courts are going to apply this second definition of “transformativeness,” then it seems the consideration ought not to carry any more weight than the other factors because the second definition provides a basis for large-scale, corporate-funded uses of millions of works in a way that the first definition does not.  In other words Google Books may be deemed a fair use in the end, but it is not sensible that the application of “transformativeness” in Campbell be applied.  As it stands, the courts appear to be giving the same weight to “transformativeness” while using two very different definitions of the word.

Semantically speaking, I would argue that transformative is not exactly the right word to use when one specifically wants to describe some measure of modification to an existing thing like a creative expression.  The term is problematic because it begs exactly the confusion we now have in the courts—because transformative more properly describes the effects of an invention or expression to the external world (e.g. electricity was transformative in that it made modern society). While it would not be wrong in common parlance to describe, for instance, Jeff Buckley’s rendition of Leonard Cohen’s “Hallelujah” as “transformative,” even this usage would generally tend to convey that both song and listener are in some way transformed.  But in law, this is too vague.  This is why the attorneys refer to a term of art –a definition that is established within the language of the law that may or may not conform to everyday usage.  Sydnor points out that Leval himself provides little guidance in this regard when he quotes the judge thus:

“The word “transformative” cannot be taken too literally as a sufficient key to understanding the elements of fair use. It is rather a suggestive symbol for a complex thought….”

 “[T]he word “transformative,” if interpreted too broadly, can also seem to authorize copying that should fall within the scope of an author’s derivative rights. Attempts to find a circumspect shorthand for a complex concept are best understood as suggestive of a general direction, rather than as definitive descriptions.”

Right. I’m no legal scholar, but I think the concept “transformative” is a troublemaker.

Because the precedent SCOTUS ruling in Campbell is based on the use of “transformativeness” to describe the modification of an expressive work, it would make sense to settle upon this definition and to seek another term for considering functional uses akin to Google Books. As CEO of Copyright Alliance Keith Kupferschmid writes in a post on the organization’s website:

“The fair use doctrine is an equitable doctrine, but in functional use cases it hasnt worked that way because the transformative use test is ill equipped to effectively balance the competing interests at stake in these cases.  Fair use analysis should take into account not only the interests of owners and users but also the underlying policy objectives of the copyright law.  To account for these factors in a reasonable and balanced way, it is time for the courts to begin using a functional use test.”

Unfortunately for rights holders, the confusion about “transformativeness” that leaks into general consciousness results in a casual logic, which assumes that simply changing the context of a work, like placing a photograph on one’s Facebook page, is “transformative” enough to make a use fair.  Google Books is a misstep in that direction, and if this becomes the application of fair use, then that’s the ballgame.  There are no copyrights left. I can take your songs or images, put them on this blog, call it “transformative”, and get away with it.  That may be an attractive proposal to the internet industry, but it is far from the original intent of fair use doctrine in the copyright law, which was to protect expression, and it would have disastrous effects on the professional creative industry as we know it.


*Changed from original publication, which stated that the factors are considered by a three-judge panel.  As pointed out by Anonymous commenter, this is only true in an appellate court. A mistake I made in haste owing to the fact that many famous fair use cases are famous because they’ve gone to higher courts.

Google Books is a good thing, but …

Given the way information tends to distort at lighting speed these days—particularly through the filter of tech v copyright referenced in my last post—I’m not surprised to read articles like this one by Ellen Duffer writing for Forbes on a thesis proposing reasons why Google Books is “good for publishers.” And it’s not that everything she says is incorrect so much as irrelevant, if the article is purposely meant to comment on the recent 2nd Circuit Court ruling in favor of Google in its ongoing litigation with The Authors Guild.

Not only does this  lawsuit have nothing to do with publishers, the timing of Duffer’s article, essentially making an argument for the worthiness of Google Books, might lead readers to think this lengthy litigation has been all about stopping the project from moving forward. It hasn’t.  There is no need for Duffer or anyone else to extoll the virtues of Google Books when the litigant authors generally agree that the search tool is a tremendously valuable resource with great social benefit. Hence, The Authors Guild has never filed for injunctive relief asking the court to order Google to stop what it’s doing. What the authors do want is compensation from Google for digitizing their books. As stated by Authors Guild President Roxana Robinson, “We aren’t challenging the concept of a search engine, just the seizure of copyrighted material. If Google is willing to compensate an author for using her work, they’re welcome to offer searches in it as much as they like.”

In order to create the Books search tool, Google has digitized over 20 million complete works.  Many of these are in the public domain; many are still in print and are still under copyrights owned by publishers; and many are works (in print or out) for which the copyrights are owned by the individual authors or their estates.  The public domain works are obviously fair game; but regarding the books still under copyright, Google has a negotiated contract with the publishers but no deal to compensate any of the authors.

The full story behind this division is a ten-year saga of attempted deals and lawsuits going back to the days when this project began as a partnership between Google and publishers, and then the libraries got involved; but the individual authors who own their own copyrights have never been paid, which makes them wholly involuntary contributors to this potentially profitable venture for Google.  (Please tell me nobody believes at this point that Google is doing this, or anything else, solely for the greater good. Can you say $400 billion market cap?)

Google has claimed that securing rights for individual works is too cumbersome, to which The Authors Guild’s Executive Director Mary Rasenberger responds, “Google has made much of how hard it is to clear authors’ rights. Our sister organization, the Authors Registry, can assure them it is not difficult. We can show them how it’s done, and with far less money than Google has at hand.”

To be sure, complaining about the scope of work required to clear these rights sounds a little fishy coming from the company that processes 20 petabytes of data every day, the organizer of the world’s information, the unrivaled leader in all things search and index, and the company that flaunts its ability to innovate at “Google scale.” It seems to me if you’ve got both the resources and the chutzpah to want to be the first company to digitize every book on the planet, that securing even a large number of rights should be a relatively minor function of the overall project.

Instead, it appears that if Google fought this hard and spent what must be millions in legal fees just to not pay the authors, their rationale is probably not about the money; and it’s not credibly because the process is too daunting for them.  Surely, Google hoped to prevail on a fair use defense—as it has to date—and to break new legal ground in its ongoing effort to reshape the fair use exception until it is so over-broad as to be almost meaningless.

Having said that, legal experts will disagree about the extent to which this most recent  ruling really sets new and clear precedent, rather than introducing a new vagueness to the doctrine that will only be clarified through future litigation.  Either way, Google’s agenda seems transparent; and as much as we may like Google Books itself, the general public should not be too quick to assume that broadening fair use doctrine is automatically more democratic or will foster more innovation, particularly when the doctrinal change is being pushed so hard by such a powerful corporate entity. After all, Google has a pretty consistent track record for consolidating market share and for pushing boundaries in this country and abroad with regard to the rights and interests of individuals and small entities.

Whether Google pursues cases that weaken IP protections or privacy rights; or it exerts the power of its monopsony position on platforms like YouTube, I think people have figured out that Google is just a business and should not be assumed to represent all that is good about the idealistic underpinnings of the Internet itself.  The company’s empower the individual rhetoric is just PR, and with the recent dropping of its founding motto “Don’t be evil,” we are reminded that this is all just business; and no business gets to nearly a half-trillion-dollar market cap without being at least a little evil to somebody.  Google Books sort of makes this point; it’s a good service supported by somewhat evil means in that it disenfranchises the most vulnerable individuals involved, when this is unnecessary in order to fulfill its otherwise worthy goals.

I also think the Books case serves to highlight a pattern consistent with Google’s game of steadily eroding the legal rights and/or bargaining power of individuals while trading on the illusion that it serves as an engine of individual rights and individual voices.  We’ve seen how the independent musical artists on YouTube have had the gateway drug of Content ID pulled from them if they choose not to sign the newly exploitative MusicKey contract. And with plans to launch the video subscription service YouTube Red, Google appears ready to employ similar hard-ball tactics with its most lucrative video content partners, offering them the choice of a lesser revenue-share deal or outright removal from the platform. (Lest anyone forget, it’s really TheirTube.)

If we combine the kind of pressure Google exerts on independent creators through its policy agendas with the company-store type terms it can dictate to individual creators, it’s easy to think of this strategy as the digital-age equivalent of union-busting during the late 19th and early 20th centuries.  Strip labor (in this case content creators) of both their rights and their negotiating power while consolidating market share in a technological paradigm that fosters natural monopolies. It may be the future, but it’s actually a very old story being written in ones and zeroes instead of coal and steel.