Gen AI & the Hubris of Data

data

In almost every discussion I’ve had with creators about generative AI (GAI), I have said that we should not overlook Big Tech’s capacity for exaggeration and total flops. Because it is possible that AI products may be the next Google Glass due to cultural and/or economic forces that reject their business models. For instance, last week, Digital Music News (DMN) announced a partnership between Amazon and the AI music product Suno for the next generation of Alexa+. DMN quotes Amazon’s Panos Panay, SVP of Devices and Services thus:

Using Alexa’s integration with Suno, you can turn simple, creative requests into complete songs, including vocals, lyrics, and instrumentation. Looking to delight your partner with a personalized song for their birthday based on their love of cats, or surprise your kid by creating a rap using their favorite cartoon characters? Alexa+ has you covered.

The first time I read about Suno, it struck me as a gimmick that may not attract or sustain enough market interest to be profitable. Just the example cited above of making personalized birthday songs seems like the kind of thing a household can only do a few times before it gets stale. “Surprise your kid by creating a rap…” sounds like what the kids calls “cringy.” But the broader question posed by Suno is whether consumers want “personalized” music, or whether the whole concept is the just another hubristic statement about the power of data in the arts.

There have been many arguments presented by theorists and scholars that consumer data either obviates the need for creators’ rights (copyrights) or justifies substantially limiting those rights. The general premise is that if consumer data informs creators about what audiences want, this insight lowers the risk of investing in production. Lowering that risk, say the theorists, implies rethinking copyright protection—or even rethinking the nature and value of creators, as Professors Sprigman and Rustalia proposed in a paper I critiqued in 2018.

As argued in that criticism and elsewhere on this blog, the goal of artists and creators is not necessarily to give audiences what they want. While one cannot dispute the market value of certain “formulas,” there is substantial evidence that when producers strive too hard to meet audience expectations, audiences are often disappointed. In short, risk is inherent to creative expression and audience experience.

In every medium and every genre, consumers want to be surprised by artists, and shifting modes of expression reflect artists’ personal responses to contemporary events. In general, the most successful (i.e., meaningful) works are the ones we didn’t know we wanted until we had them. And once these works become part of the vernacular of our lives, we cannot imagine living without them.

By contrast theories about the power of data as a predictor of creative success are founded in a techno-centric arrogance that, to me, is exemplified in a product like Suno. The idea that the consumer wants music to be tailored from a few instructions—“Alexa make me a punk rock song about a guy who lost his job.”—is typical of the kind of “innovation” many technologists would develop by ignoring fundamental reasons we enjoy music in the first place.

As explored in this post about opera, I agree that music, and other expressive media, can be replicated by an AI to provoke emotional responses in human observers. Simply put, if a composer knows that minor chords have a certain effect on the Western listener, then an AI can follow the same rule to produce a “melancholy” tune. But the science of music and human psychology only explains our instinctive, animal-like responses to combinations of sounds while leaving out the rest of the experience.

We cherish our playlists for reasons that transcend the sounds’ effects on our brains—i.e., transcend mere taste. We relate and return to artists or their messages; we store and recall memories in the songs we replay; and we connect to friends and family through songs we have in common. Suno, outputting a bespoke song like a tepid cocktail cannot provide any of that. On the contrary, it omits all those aspects of music that make us care about it, suggesting that its outputs are indeed gimmicks destined to become as dull as they are disposable when the short-lived novelty wears off. At least that’s my prediction.

There is, of course, a more insidious question worth asking—namely whether a product like Suno, especially when paired with Amazon, is less significant as a custom jukebox than it is as a new surveillance device. The use of personal data to micro-target and manipulate people and alter the course of major world events is not science fiction anymore. In that light, is it not conceivable that, say, 100-million people expressing their sentiments to an AI “music composer” will add color to data that will only exacerbate surveillance capitalism? That would be one hell of a way to pervert music.


Photo by: Cm2012

Are Creators Aligned on Artificial Intelligence?

creators

One of many challenges with adoption of generative AI (GAI) tools is whether creators are willing to demonstrate a degree of solidarity on the matter—i.e., apply the principle we generally call fair trade. If Creator A uses a GAI that might be harmful to Creator B in a different field, and so on, will most creators take this broader perspective in a group effort to demand ethical uses of GAI?  Moreover, this question becomes intertwined with copyright because the use of GAI is a subject of evolving legal doctrine, meaning that creators who want to produce commercial content outside their core talents should be aware that the material produced may not be protectable under the law.

Two simple examples would be the self-published book author who might use an AI voice app to produce an audiobook, and the documentary filmmaker who might use an AI music generator to produce a soundtrack for a film. In both examples, creators in other fields—voice actors and composers respectively—are potentially harmed by the development and use of these AI tools, but 1) will the author and filmmaker take that consideration into account?; and 2) will the sound recordings in either case be protected by copyright?

In the case of the author using AI in lieu of hiring a narrator to produce the audiobook, I predict that under current doctrine, the sound recording would not be protected by copyright law because there is no human performance captured in that recording. Thus, remedies for any piracy of the audiobook would rely solely on the protection of the underlying literary work, which is effective—but if the sound recording is also protected and registered, that would be two works infringed instead of one.

This increases the potential damages for infringement, which puts the author/owner in a stronger position if she needs to take legal action. By this example, authors’ interests may be seen as aligned with those of professional book narrators. Hiring a narrator will not only achieve better quality in the reading, but capturing the human performance is also a basis for copyright attaching to the sound recording.

Similar considerations would apply to the filmmaker with the GAI soundtrack, although there may be other factors that provide the AI music with some protection we don’t find with the AI audiobook. One factor that may become relevant is whether the filmmaker can show that he exerted sufficient creative control over the final sounds. If so, he may be able to defend a claim of copyright in the soundtrack, but we are likely several years and a few lawsuits away from clear guidance on this question.

Another consideration with the soundtrack may be the Copyright Office’s current view that material using assistive AI “within a larger work” is protected. Creators should be careful about interpreting that broad language because constituent works that stand alone—and this would apply to a soundtrack for a film—would logically not be independently protected.

Of course, there are many GAI products that allow one type of creator to avoid hiring another type of creator for a given project. Some of this is inevitable, and it is not necessarily unethical or bad for creative culture. That said, even with ethically trained and ethically used AI tools, the copyright considerations should be weighed by the individual creator (i.e., do they care about protecting what might not be protectable?), but also collectively by all creators contributing to a new ecosystem.

Since 1978 in the U.S., the default is automatic copyright protection, even if most rights are never enforced. But as GAI is used to produce a lot of material that is not protected, it is hard to predict what effect this might have on copyright overall. Even older than automatic copyright with the 1976 Act, the human authorship principle fosters a new tension for creators who may wish to combine GAI and human-authored work. As a response to that tension, it would be a mistake in my view to overwrite the “human spark” doctrine and simply protect any material that “walks and talks” like a creative work. This isn’t just an emotional appeal to anthropocentrism but rather a conviction that copyright would become meaningless—even unconstitutional—by eroding the incentive rationale for its existence.

Regardless of the theoretical questions addressed in this post, I believe that as a practical matter, creators should think carefully about how and when to use GAI for various projects. As an ethical consideration, perhaps if you’re opposed to “scraping” in your industry, then opposing it in others is the right view to take. But as a business consideration, if what you’re making is meant to have commercial value, AI-generated might mean not protected by copyright—and that means even if you spend money and time on it, it isn’t yours.

Guarantee of Confusion: When AI Scrapes the News

news

That title riffs on the term of art in trademark law known as “likelihood of confusion.” It refers to a foundational test, which asks whether the average consumer will confuse a particular mark (words, design, or both) with a product or service that is not produced or distributed by the company associated with a known mark. Thus, beware the Rollex, Tilynol, or even the KleanEx. But when a real trademark is used to promote a defective product, confusion is certain—especially when the brand is a news producer.

In a lawsuit filed today by several major news publishers against an AI developer (Advance Local Media et al. v. Cohere Inc.), we see a good example of copyright and trademark combining to serve the public interest in contrast to the extensive harm that can be done by technology developers running roughshod over IP rights. Copyright incentivizes the investment in professional journalism needed to report reliable news, and trademark identifies the source of the news we choose to trust. I know readers will be inclined these days to criticize one news organization or another, but hold that thought.

The complaint filed in the District Court for the Southern District of New York names as plaintiffs several well-known news publishers (e.g., Condé Nast, Los Angeles Times, The Guardian) who allege that AI developer Cohere is liable for both copyright and trademark infringement. Valued at $5.5 billion, “Cohere’s primary product is its suite of LLMs referred to as the Command Family of models…these LLMs are trained on vast amounts of text and as a result can generate text-based, natural language responses to user queries,” the complaint states.

The Copyright Allegations

On copyright infringement, the publishers intend to show that Cohere violates their exclusive rights both when it inputs protected works to train the Command products and when it outputs verbatim or substantially similar works that are reproduced, distributed, and displayed to paying customers. The two counts of alleged trademark infringement stem from use of the publishers’ registered names in conjunction with erroneous material that may be “hallucinated” by the LLM. Clearly, anyone can recognize why this would be harmful to the reputation of the named source and broadly harmful to consumers who already struggle to validate information in this miasma we call the internet.

Notably, the Publishers stress the fact that Cohere markets itself on the reliability and timeliness of the information Command provides—benefits that would be essential for its many commercial customers, but which the company allegedly chose to accomplish through unlicensed use of the works produced by news organizations. “Cohere relies heavily on trusted journalism sources to shore up the authority of its responses. As Cohere’s CEO Aidan Gomez explained in a letter to employees and shareholders, Cohere believes that a ‘key differentiator’ for its models is the ability to receive ‘verifiable answers,” the complaint states.

Further, to support the veracity of query results, Cohere relies on “retrieval augmented generation” (RAG), which an NVIDIA blog post describes thus: “Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers — grounded in specific court proceedings or similar ones — the model needs to be provided that information.” This case law analogy is ironic in context because even at this very early stage, the copyright case law strongly suggests to this observer that Cohere should not have chosen the unlicensed path to build its products.

For example, a description from the complaint reminds me that the news summary product TV Eyes was held to be infringing on less compelling evidence than the following:  “The user can expand [the] Under the Hood [tool] to view the exact underlying documents on which Cohere relied to generate the response. Cohere refers to these sources as ‘snippets,’ but to be clear—these ‘snippets’ are generally the full text of every source on which the output was based.”

In fact, the allegations in this complaint imply so much familiar ground that it is hard to imagine how Cohere will raise a persuasive defense. For instance, just this week, I summarized the Delaware District Court finding that comparatively limited copying of Westlaw’s headnotes for an AI search product was considered a market substitute for the protected works. What Cohere is allegedly doing with news articles is similar in purpose but entails far more extensive, unlicensed use of substantially more protected expression than in Thomson Reuters v. Ross.

The Trademark Allegations

With the RAG tool switched on, Command will apparently provide reliable news by copying, distributing, and displaying unlicensed copies of Publishers’ works. But with RAG switched off, its LLM might hallucinate and then attribute the resulting misinformation to one of the named plaintiffs. For instance, the complaint cites a Cohere “article” that confuses the 2023 massacre at the Nova Music Festival with a 2020 shooting in Nova Scotia; reports that a man murdered at the latter “returns to the scene” of the former; and then attributes this whole mess to The Guardian.

The Publishers allege that Cohere violates two counts of the Lanham Act—trademark infringement and false designation of origin—both of which seem highly plausible based on the facts presented. We shall see whether Cohere can present compelling facts to rebut the allegations, but otherwise, as to the questions of law in this case, I predict this one easily goes to the plaintiffs.

As mentioned above, I know some readers may scoff at the premise that quality journalism is consistently the hallmark of well-established news publishers today. And to be sure, one must occasionally check the math in various articles and editorials. But I maintain that Big Tech, through its predatory model of monetizing everything it does not create—plus our willingness to believe utter nonsense online—exerts a pressure on professional journalism that borders on an existential threat. Left unchecked, the AI shenanigans like those described in this lawsuit do more than violate IP law; they undermine the efforts of any reporter who is still trying to present reality.


Photo by AndreyPopov