Recent AI Copyright Lawsuits Are About More than Compensation for Authors

Tallinn / Estonia - September 18, 2019: Black Apple iPhone with icons of News media: BBC News, Forbes, CNN, WSJ, WP, Bloomberg, Guardian, NY Times and Euronews applications on screen. News media icons.

Last week, writer and broadcaster Andrew Keen invited me to his podcast Keen On to talk (of course) about artificial intelligence. When we got to the subject of the New York Times lawsuit against Open AI and Microsoft, I noted that 1) it is arguably the strongest copyright case presented to date against an AI developer; 2) that it would likely result in a substantial licensing deal between the parties; and 3) that it is hard to say what any of this means for journalism going forward. On that same subject, nonfiction authors Nicholas Basbanes and Nicholas Gage filed a class action suit against Open AI and Microsoft on January 5, just over a week after the Times suit was filed.

As discussed in other posts, although generative AI unequivocally poses a threat to authors and authorship, U.S. copyright law is, oddly enough, not quite designed to address the full scope of the social, economic, and cultural challenge of that threat. While this seems counterintuitive, the difficulty lies in the fact that copyright promotes authorship by protecting works against specific means of infringement, and the nail-biting question of the moment is whether “machine learning” (ML) with the use of protected works violates the reproduction right (§106(1)) of the Copyright Act.

Here, the Times case is strong because the news organization presents compelling, side-by-side evidence that its published stories are being output by ChatGPT almost verbatim. This is evidence that not only is reproduction occurring in the AI model, but that the outputs provided to users serve as a substitute for legal access to the Times’s material. The evidence of reproduction establishes a solid claim of infringement, while the evidence of substitution goes against Open AI’s putative fair use defense. In fact, it was the same circuit (the Second) which held that a news service called TVEyes was “slightly transformative” but that it made so much of Fox News’s material available, even in segments, that the substitutional purpose doomed its fair use defense.

Unlike the Times, the nonfiction book authors do not present side-by-side evidence of verbatim copying of their published writings, and this is consistent with some of the other class-action suits. These are the real nail-biter cases, in my view, because the plaintiffs’ cause is just, but their proof of copyright infringement is less demonstrable than the Times (or the Concord v. Anthropic case for that matter). But this focus on both The New York Times and nonfiction authors raises a serious question as to whether AI will exacerbate the already dismal state of information in the information age.

When the early work of this blog started in 2011, one of the issues of concern was the volume of mediocre, careless, or inaccurate reporting and commentary being promulgated under brands normally associated with quality journalism. Here, it must be said that the Gray Lady herself has not always been immune to the digital-age forces of volume and speed that can drive reporters and editors to engage the market on the lowest rungs. But if the stodgy algorithms of social media have animated a new era of yellow journalism, isn’t it reasonable to assume that certain generative AIs will make matters worse? The internet has already fostered more misinformation than a democratic society can safely endure.

If we consider the possible outcomes of the Times lawsuit, one would be that Open AI changes the model to avoid infringing reproduction. While this may satisfy from a copyright perspective, one wonders about the quality and/or purpose of the information being provided by a tool like ChatGPT.  The output of an LLM is the result of probability. The user asks a question (a prompt), and the AI responds that in all likelihood, based on the information fed into an algorithm, this is what you want to know.

It is no wonder the system to date reproduces material verbatim from a major news organization, but if it doesn’t do that, what should it do? Or what can it do that can be called “progress” with regard to news and information? Take a multi-faceted, extremely emotional topic like Israel and Palestine, train an AI on all the solid reporting, all the mediocre editorials, and the cacophony of opinions on social media, and the user of the LLM gets…what? Why would the results be more informative or thoughtful than the veteran journalist doing her best?

Why won’t an AI be worse than “recommendation algorithms?” If YouTube and Facebook foster confirmation bias and shepherd people onto the wild grazing fields of organically grown conspiracies, it seems rational and prudent to assume that an LLM will do the same thing more efficiently. Why have an old-school search engine point you toward a bogus article linking vaccines to autism when you can have a “dialogue” with an ersatz intelligence on the same topic?

Although the nonfiction book authors do not present the kind of evidence of copyright infringement the Times exhibits in its complaint, the facts presented about the authors’ investment of time, expertise, and money makes a point that should be read as more than a mere plea for sympathy. This is not just about job loss for future historians but quite possibly about the loss of history itself.  From the Basbanes et al. complaint:

The archive of primary research materials assembled by Mr. Basbanes in support of his work over a period of forty years, when acquired by Texas A&M University in 2015, filled 365 packing boxes with documents, transcriptions, drafts, field notebooks, photographic negatives, and the like, all acquired by Mr. Basbanes in pursuit of his literary activities, and at his expense and initiative.

It is more than a legal (i.e., fair use) question whether the purpose of a model like ChatGPT is to make new and relevant use of all that work, or whether its purpose is to supplant the historian and the reporter by “feeding off the sere remains of the past,”[1] until it eventually starves. In the former case, licensing and collaborating with authors and journalists seems reasonable, in the latter case, allowing certain generative AIs to die on the vine seems imperative.


[1] From Ralph Waldo Emerson’s speech at Harvard calling for an American literary independence, August 31, 1837.

Photo by: Antonio83

Enjoy this blog? Please spread the word :)