“Fair Use” is Not a Great Business Plan

Lately, we’ve seen several headlines and comments from tech giants say that AI ventures simply cannot succeed if they are forced to contend with the copyrights in the billions of works they have scraped for the purpose of machine learning (ML). When these headlines are paired with the rampant assertions that ML is inherently fair use—a subject addressed in last Wednesday’s Senate Judiciary Committee (SJC) hearing on AI and journalism—one has to wonder about the business decisions being made before generative AI exploded last year.

In many posts on this blog, including at least a few written during “Fair Use Week,” I have repeated the caveat that “fair use” is not a magic phrase that makes infringement claims disappear. Usually, that advice is directed at small and independent users of works, suggesting they not listen to Big Tech and its network of academics and activists, who will not be on the hook for the small guy’s copyright infringement. I always assumed the big guys knew better, that they were merely chanting the “fair use” mantra as a rhetorical device in the blogosphere to promote the anti-copyright agenda. But maybe they don’t know better.

If I were an AI investor asking about potential liability, and the founders told me, “Don’t worry, what we’re doing is fair use,” my immediate response would be to ask whether there is sufficient funding for major litigation, to say nothing of predicting the outcome of that litigation. Because simply put, the party who conjures the term “fair use” has effectively assumed that a potential liability for copyright infringement exists. And if that assumption is a bad business decision, then that’s the founders’ problem, not a flaw in copyright law.

No matter what the critics say, or how hard certain academics try to alter its meaning, the courts are clear that fair use is an affirmative defense to a claim of copyright infringement, which means that building a business venture on an assumption of fair use is tantamount to assuming that lawsuits are coming. And if it’s a multi-billion-dollar venture that potentially infringes millions of works owned by major corporations, then the lawsuits are going to be big—perhaps even existential.

Do Not Expect Congress to Change Fair Use in Any Direction

Notably, as reported in Wired, Conde Nast CEO Roger Lynch stated at one point during questioning by the SJC last week, “If Congress could clarify that the use of our content, or other publisher content, for the training and output of AI models is not fair use, then the free market will take care of the rest,” to which Sen. Hawley replied that this seems reasonable. But I wonder about this exchange. While it is encouraging to find the senators more sympathetic with the news organizations than with the AI developers, I doubt (and would not even hope) that Congress is going to amend the law to explicitly state that ML is categorically never fair use.

Fair use comprises a history of judge-made law that was codified into statute as Section 107 of the 1976 revision of the U.S. Copyright Act. But the statute does not draw bright lines stating that X is always fair use and Y is never fair use, and for good reason. Because justice for all parties is best served by a court weighing the specific facts of a specific use of a specific work, or body of works. Hence, an attorney will tell you that fair use is a “fact intensive” consideration.

If Congress were to explicitly declare, for instance, that ML can never be fair use, this would be a significant departure from doctrine, and one that is preemptively unjust to the potential AI developer with a fact pattern that would favor a finding of fair use. As much as I find the major generative AI companies to be some combination of arrogant and/or useless, and as much as I scorn their generalizations to-date about fair use, it would be wrong to endorse legislative revision of the fair use doctrine as a sound response.

In fact, if the court were to find fair use for ML in New York Times v. Open AI (and I doubt it will), and Congress sought to remedy that outcome, it would still not make sense to amend Section 107. If anything, news organizations and other copyright owners would likely seek a new section of the Copyright Act tailored to the nature of the new form of harm, which Big Tech would then blindly oppose with every available resource. For instance, it is possible that the Times would not currently be suing Open AI if the tech industry had not opposed the Journalism Competition and Preservation Act (JCPA), which would have temporarily exempted news organizations from antitrust barriers to collective bargaining for licensing their content.

Regardless, no party should be asking Congress to “clarify fair use” in response to AI. If the AI founders and investors made a bad bet on an ultimate finding of fair use, that’s tough noogies for them. But neither should content creators want Congress to open that particular can of worms and disturb the fair use case law. Of course, where Congress should intervene is to address harms caused by AI where no law currently applies. On that subject, the next post discusses the recently proposed No AI FRAUD Act.


Phot source by areporter.

Recent AI Copyright Lawsuits Are About More than Compensation for Authors

Last week, writer and broadcaster Andrew Keen invited me to his podcast Keen On to talk (of course) about artificial intelligence. When we got to the subject of the New York Times lawsuit against Open AI and Microsoft, I noted that 1) it is arguably the strongest copyright case presented to date against an AI developer; 2) that it would likely result in a substantial licensing deal between the parties; and 3) that it is hard to say what any of this means for journalism going forward. On that same subject, nonfiction authors Nicholas Basbanes and Nicholas Gage filed a class action suit against Open AI and Microsoft on January 5, just over a week after the Times suit was filed.

As discussed in other posts, although generative AI unequivocally poses a threat to authors and authorship, U.S. copyright law is, oddly enough, not quite designed to address the full scope of the social, economic, and cultural challenge of that threat. While this seems counterintuitive, the difficulty lies in the fact that copyright promotes authorship by protecting works against specific means of infringement, and the nail-biting question of the moment is whether “machine learning” (ML) with the use of protected works violates the reproduction right (§106(1)) of the Copyright Act.

Here, the Times case is strong because the news organization presents compelling, side-by-side evidence that its published stories are being output by ChatGPT almost verbatim. This is evidence that not only is reproduction occurring in the AI model, but that the outputs provided to users serve as a substitute for legal access to the Times’s material. The evidence of reproduction establishes a solid claim of infringement, while the evidence of substitution goes against Open AI’s putative fair use defense. In fact, it was the same circuit (the Second) which held that a news service called TVEyes was “slightly transformative” but that it made so much of Fox News’s material available, even in segments, that the substitutional purpose doomed its fair use defense.

Unlike the Times, the nonfiction book authors do not present side-by-side evidence of verbatim copying of their published writings, and this is consistent with some of the other class-action suits. These are the real nail-biter cases, in my view, because the plaintiffs’ cause is just, but their proof of copyright infringement is less demonstrable than the Times (or the Concord v. Anthropic case for that matter). But this focus on both The New York Times and nonfiction authors raises a serious question as to whether AI will exacerbate the already dismal state of information in the information age.

When the early work of this blog started in 2011, one of the issues of concern was the volume of mediocre, careless, or inaccurate reporting and commentary being promulgated under brands normally associated with quality journalism. Here, it must be said that the Gray Lady herself has not always been immune to the digital-age forces of volume and speed that can drive reporters and editors to engage the market on the lowest rungs. But if the stodgy algorithms of social media have animated a new era of yellow journalism, isn’t it reasonable to assume that certain generative AIs will make matters worse? The internet has already fostered more misinformation than a democratic society can safely endure.

If we consider the possible outcomes of the Times lawsuit, one would be that Open AI changes the model to avoid infringing reproduction. While this may satisfy from a copyright perspective, one wonders about the quality and/or purpose of the information being provided by a tool like ChatGPT.  The output of an LLM is the result of probability. The user asks a question (a prompt), and the AI responds that in all likelihood, based on the information fed into an algorithm, this is what you want to know.

It is no wonder the system to date reproduces material verbatim from a major news organization, but if it doesn’t do that, what should it do? Or what can it do that can be called “progress” with regard to news and information? Take a multi-faceted, extremely emotional topic like Israel and Palestine, train an AI on all the solid reporting, all the mediocre editorials, and the cacophony of opinions on social media, and the user of the LLM gets…what? Why would the results be more informative or thoughtful than the veteran journalist doing her best?

Why won’t an AI be worse than “recommendation algorithms?” If YouTube and Facebook foster confirmation bias and shepherd people onto the wild grazing fields of organically grown conspiracies, it seems rational and prudent to assume that an LLM will do the same thing more efficiently. Why have an old-school search engine point you toward a bogus article linking vaccines to autism when you can have a “dialogue” with an ersatz intelligence on the same topic?

Although the nonfiction book authors do not present the kind of evidence of copyright infringement the Times exhibits in its complaint, the facts presented about the authors’ investment of time, expertise, and money makes a point that should be read as more than a mere plea for sympathy. This is not just about job loss for future historians but quite possibly about the loss of history itself.  From the Basbanes et al. complaint:

The archive of primary research materials assembled by Mr. Basbanes in support of his work over a period of forty years, when acquired by Texas A&M University in 2015, filled 365 packing boxes with documents, transcriptions, drafts, field notebooks, photographic negatives, and the like, all acquired by Mr. Basbanes in pursuit of his literary activities, and at his expense and initiative.

It is more than a legal (i.e., fair use) question whether the purpose of a model like ChatGPT is to make new and relevant use of all that work, or whether its purpose is to supplant the historian and the reporter by “feeding off the sere remains of the past,”[1] until it eventually starves. In the former case, licensing and collaborating with authors and journalists seems reasonable, in the latter case, allowing certain generative AIs to die on the vine seems imperative.


[1] From Ralph Waldo Emerson’s speech at Harvard calling for an American literary independence, August 31, 1837.

Photo by: Antonio83

Things We Don’t Need: Generative AI

When I was planning to start The Illusion of More, I contemplated a category of posts under the heading We Don’t Need This. Although abandoned, I thought it might be an editorial framework for articles about innovations that really aren’t innovative, and the low-tech invention that originally inspired the idea was the kiddie-car/shopping-cart hybrid. In case you haven’t had the pleasure, this vehicle enables a small child to “drive” a plastic car attached to the basket one pushes through the supermarket. As the parent of a small child (at the time IOM was launched), I found this innovation was a terrible idea—one that demanded use the moment the child laid eyes upon it, but which mostly offered poor maneuverability through the aisles and unnecessary geometric struggle at check-out.

There is, of course, nothing connecting the kiddie-car/shopping-cart to generative AI except, in my view, the fact that we don’t need either one. Or at least, we don’t need most of what generative AI appears to be doing, and this is perhaps the most maddening aspect of the most prominent generative AI tools making the headlines—that they serve no purpose and, if we’re getting all IP about it, promote no progress. I’ve said it, and I’ll keep saying it:  we do not need computers to make artistic works.

This month, the Federal Trade Commission (FTC) issued a report describing its early findings about AI’s potential harms which may be addressable under the agency’s purview. Charged with enforcing prohibitions against unfair, non-competitive business practices and protecting consumers, the FTC hosted a roundtable discussion with members of the creative community to hear their concerns about both the development and public deployment of generative AIs. As the report states:

Various competition and consumer protection concerns may arise when AI is deployed in the creative professions. Conduct–such as training an AI tool on protected expression without the creator’s consent or selling output generated from such an AI tool, including by mimicking the creator’s writing style, vocal or instrumental performance, or likeness—may constitute an unfair method of competition or an unfair or deceptive practice.

In response to the report—specifically to the passage quoted above—three well-known copyright critics, Pamela Samuelson, Matthew Sag, and Christopher Sprigman (SS&S) criticized the FTC “both for its opacity and for the ways in which it may be interpreted (or misinterpreted) to chill innovation and restrict competition in the markets for AI technologies.” Before responding to that allegation, I must indulge in a little gallows humor and mention that the economic and global-security leader of the free world is in danger of shredding its Constitution, going full-tilt authoritarian, and spiraling into a deathroll of ignorance and cruelty. And yet, we’re going to talk about “chilling innovation” in generative AI as if it’s a matter of urgency. The world is in crisis, and billions have been invested to see who can do the best job getting a computer to write a poem or make a picture? Talk about whimpers instead of bangs.

There are two reasons that sentiment is not raw Ludditism. The first is that it does not dismiss all AI development in the creative industry as useless; and the second is that the “copyright stifles innovation” bullet point is a generalization that should never be uttered again—especially in light of its direct role in fostering the above-mentioned prospect of democracy’s collapse. We’ve heard all this before—specifically from SS&S and their colleagues in academia and the “digital rights” organizations. We’ve been told that copyright stifles the free and open internet, access to information, and the speech right.

But in addition to the fact that the premise itself was false, the grand social media experiment in the “democratization of everything” must be recognized as an abysmal failure, and its cheerleaders should muster the humility to stifle their tiresome and dangerous refrains in context to AI. Social media companies and their friends in academia—and here, I must include President Obama’s Google-friendly administration—share considerable blame for the heedless, tech-enabled populism that has fostered so many social hazards, including a literal seditionist now leading one of America’s two political parties.

Notably, the FTC report does not mention copyright very much, and in fact, many of the creative professionals who participated in the discussions acknowledged that because they are not copyright owners (e.g., voice actors and screenwriters for hire were among the representatives), they do not have rights currently protecting them against generative AI resulting in the kind of unfair outcomes, which the FTC is charged with mitigating. It would take too long a post to respond to all the critiques presented by SS&S, but I wanted to focus on this statement:

We are concerned especially about the suggestion in the FTC’s Comments that AI training might be a Section 5 violation where it “diminishes the value of [a creator’s] existing or future works.” A hallmark of competition is that it diminishes the returns that producers are likely to garner relative to a less competitive marketplace. This is just as likely to be true in markets for creative goods, such as novels and paintings, as it is in markets for ordinary tangible goods like automobiles and groceries. AI agents that produce outputs that are not substantially similar to any work on which the AI agent was trained, and are thus not infringing on any particular copyright owner’s rights, are lawful competition for the works on which they are trained.  Surely the FTC does not plan to have Section 5 displace the judgments of copyright law on what is and what is not lawful competition?

To summarize, that paragraph declares that it does not matter if generative AI displaces human authors, that in fact, it is a threshold we should be eager to cross. Notwithstanding the fact that two of the high-profile lawsuits present compelling evidence of substantially similar outputs,[1] the more concerning implication of that paragraph is that SS&S endorse the inevitability that generative AI will devalue human creators and/or eliminate them altogether. Moreover, calling this eventuality a form of “competition” reveals an unsettling perspective consistent with every anti-copyright paper I have ever read—namely, that the production of creative works is no different than the production of any other product or service.

I’ve said many times that copyright critics don’t understand artists, and here, the inapt word competition demonstrates why this axiom endures. For instance, publishers are in competition with one another to an extent, but authors are not—at least not in the sense that the concept applies in other industries—least of all Big Tech. No novelist, for instance, wants to hold the undivided and exclusive attention of all readers the way Meta wants eyeballs never to stray for long from its platforms. Artists thrive in a diverse market of other artists, consumers benefit as a result, and copyright is an engine of that diversity, not a barrier to it. Artists may feel competitive or jealous at times, or even behave in a competitive manner (because they’re human), but the reality is that they need one another to exist at a scale that is not comparable to other “businesses.” True to form, copyright critics like to cite the interdependence of authors to highlight copyright’s limitations but then ignore the same principle in support of tech giants swallowing all creative enterprise whole.

The primary concern expressed by SS&S appears to be that the FTC alleges that AI training with copyrighted works is an act of infringement. Unsurprisingly, this same trio submitted comments to the Copyright Office arguing that AI training with protected works is fair use, but as that very question is already presented in several court cases, I assume SS&S are primarily concerned with optics here. The trio states, “The FTC has no authority to determine what is and what is not copyright infringement, or what is or is not fair use. Under governing law, that is a judicial function.”

Exactly. And the question is now before the courts. So, what’s the problem? That the FTC should not even raise the issue? According to tweets by Samuelson and Sprigman, they argue that the FTC’s report is one-sided, that it is too creator-focused and does not account for the testimony or opinions of the technology companies developing AI. But while I certainly agree that multistakeholder hearings etc. are the proper approach to developing new policy, it is impossible to tolerate a complaint about lack of balance coming from the anti-copyright crowd at all, and from these individuals in particular. For instance, readers may not remember the American Law Institute Restatement of Copyright, initiated by Samuelson and led by Sprigman, but critics of the project—some of the most prominent names in copyright scholarship—specifically cite the opacity of the restatement process and deafness of its managers to the concerns and recommendations of their colleagues.

More broadly, it must be said that if, indeed, the FTC lately gave more attention to the creators than they did to the tech companies, then this was a long overdue anomaly. Between at least the mid-late 1990s and 2016, the tech companies were treated with kid gloves, handed the keys to Washington, and feted like the economic and democratic engines they claimed to be. Since 2016, sentiment began to swing in the other direction, as many Americans began to see how disinformation plus data manipulation can become a wrecking ball for a whole society.

If Big Tech lost the previously undeserved benefit of the doubt, good. AI has the potential to exacerbate many of the same Web 2.0 harms at unprecedented speed and scale, and if the FTC, the USCO, the courts, or Congress look askance at the developers, then it is a mistrust well earned. And again, at least with regard to generative AI designed to make creative works, none of the parties empowered to write policy in this area should forget the bottom line:  that when it comes to producing creative work, we truly do not need generative AI.


[1] Concord et al. v. Anthropic and NYT v. Open AI, et al

SEE ALSO: The Washington Post reported this month that Big Tech continues to significantly fund and influence academia in these policy areas.

Photo by: Jollier