Guarantee of Confusion: When AI Scrapes the News

news

That title riffs on the term of art in trademark law known as “likelihood of confusion.” It refers to a foundational test, which asks whether the average consumer will confuse a particular mark (words, design, or both) with a product or service that is not produced or distributed by the company associated with a known mark. Thus, beware the Rollex, Tilynol, or even the KleanEx. But when a real trademark is used to promote a defective product, confusion is certain—especially when the brand is a news producer.

In a lawsuit filed today by several major news publishers against an AI developer (Advance Local Media et al. v. Cohere Inc.), we see a good example of copyright and trademark combining to serve the public interest in contrast to the extensive harm that can be done by technology developers running roughshod over IP rights. Copyright incentivizes the investment in professional journalism needed to report reliable news, and trademark identifies the source of the news we choose to trust. I know readers will be inclined these days to criticize one news organization or another, but hold that thought.

The complaint filed in the District Court for the Southern District of New York names as plaintiffs several well-known news publishers (e.g., Condé Nast, Los Angeles Times, The Guardian) who allege that AI developer Cohere is liable for both copyright and trademark infringement. Valued at $5.5 billion, “Cohere’s primary product is its suite of LLMs referred to as the Command Family of models…these LLMs are trained on vast amounts of text and as a result can generate text-based, natural language responses to user queries,” the complaint states.

The Copyright Allegations

On copyright infringement, the publishers intend to show that Cohere violates their exclusive rights both when it inputs protected works to train the Command products and when it outputs verbatim or substantially similar works that are reproduced, distributed, and displayed to paying customers. The two counts of alleged trademark infringement stem from use of the publishers’ registered names in conjunction with erroneous material that may be “hallucinated” by the LLM. Clearly, anyone can recognize why this would be harmful to the reputation of the named source and broadly harmful to consumers who already struggle to validate information in this miasma we call the internet.

Notably, the Publishers stress the fact that Cohere markets itself on the reliability and timeliness of the information Command provides—benefits that would be essential for its many commercial customers, but which the company allegedly chose to accomplish through unlicensed use of the works produced by news organizations. “Cohere relies heavily on trusted journalism sources to shore up the authority of its responses. As Cohere’s CEO Aidan Gomez explained in a letter to employees and shareholders, Cohere believes that a ‘key differentiator’ for its models is the ability to receive ‘verifiable answers,” the complaint states.

Further, to support the veracity of query results, Cohere relies on “retrieval augmented generation” (RAG), which an NVIDIA blog post describes thus: “Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers — grounded in specific court proceedings or similar ones — the model needs to be provided that information.” This case law analogy is ironic in context because even at this very early stage, the copyright case law strongly suggests to this observer that Cohere should not have chosen the unlicensed path to build its products.

For example, a description from the complaint reminds me that the news summary product TV Eyes was held to be infringing on less compelling evidence than the following:  “The user can expand [the] Under the Hood [tool] to view the exact underlying documents on which Cohere relied to generate the response. Cohere refers to these sources as ‘snippets,’ but to be clear—these ‘snippets’ are generally the full text of every source on which the output was based.”

In fact, the allegations in this complaint imply so much familiar ground that it is hard to imagine how Cohere will raise a persuasive defense. For instance, just this week, I summarized the Delaware District Court finding that comparatively limited copying of Westlaw’s headnotes for an AI search product was considered a market substitute for the protected works. What Cohere is allegedly doing with news articles is similar in purpose but entails far more extensive, unlicensed use of substantially more protected expression than in Thomson Reuters v. Ross.

The Trademark Allegations

With the RAG tool switched on, Command will apparently provide reliable news by copying, distributing, and displaying unlicensed copies of Publishers’ works. But with RAG switched off, its LLM might hallucinate and then attribute the resulting misinformation to one of the named plaintiffs. For instance, the complaint cites a Cohere “article” that confuses the 2023 massacre at the Nova Music Festival with a 2020 shooting in Nova Scotia; reports that a man murdered at the latter “returns to the scene” of the former; and then attributes this whole mess to The Guardian.

The Publishers allege that Cohere violates two counts of the Lanham Act—trademark infringement and false designation of origin—both of which seem highly plausible based on the facts presented. We shall see whether Cohere can present compelling facts to rebut the allegations, but otherwise, as to the questions of law in this case, I predict this one easily goes to the plaintiffs.

As mentioned above, I know some readers may scoff at the premise that quality journalism is consistently the hallmark of well-established news publishers today. And to be sure, one must occasionally check the math in various articles and editorials. But I maintain that Big Tech, through its predatory model of monetizing everything it does not create—plus our willingness to believe utter nonsense online—exerts a pressure on professional journalism that borders on an existential threat. Left unchecked, the AI shenanigans like those described in this lawsuit do more than violate IP law; they undermine the efforts of any reporter who is still trying to present reality.


Photo by AndreyPopov

David Newhoff
David is an author, communications professional, and copyright advocate. After more than 20 years providing creative services and consulting in corporate communications, he shifted his attention to law and policy, beginning with advocacy of copyright and the value of creative professionals to America’s economy, core principles, and culture.

Enjoy this blog? Please spread the word :)