Major Record Labels Sue Gen AI Devs Suno and Udio

The most prominent copyright lawsuit against Generative AI (GAI) to date dropped yesterday when the major record labels filed complaints against developers Suno and Udio in the District of Massachusetts and the Southern District of New York respectively. This is going to be one to watch, not just because of the size of the plaintiffs and the potential for significant damages, but because the complaints, in my view, present an intriguing combination of the legal questions addressed in most, if not all, of the other lawsuits filed against GAI companies.

For instance, in NY Times v. Open AI and Concord et al. v. Anthropic, both plaintiffs make a compelling prima facie case for copyright infringement by presenting large bodies of evidence showing either literal copies or substantially similar material output by the defendants’ systems. This is distinct from some of the visual artists’ lawsuits against Gen AIs like Midjourney and DALL-E where the allegations of infringement entail more inference than direct evidence of specific works copied. Not that the visual GAIs don’t output literal copies of protected works—they do—but I do not believe a plaintiff has yet filed suit with a body of that kind of evidence.

Interestingly, the evidence presented by the record labels to show that their protected sound recordings were used to train Suno and Udio encompasses a combination of substantially similar copies in the outputs, a measure of inference, and a number of self-incriminating statements by the defendants themselves. This includes the unwise assertion made by every GAI developer that machine learning (ML) is fair use, but I’ll come back to that.

Regarding direct evidence, both complaints cite several examples whereby, with a few general prompts, the systems will output music that is substantially similar to famous songs. “These similarities are further reflected in the side-by-side transcriptions of the musical scores for the Suno file and the original recording. These similarities are only possible because Suno copied the Copyrighted Recordings that contain these musical elements,” the Suno complaint states.

See cover image from plaintiffs’ transcriptions. “Red markings in the transcriptions indicate notes that are the same as the original in both pitch and rhythm, where orange markings indicate notes that use either the pitch or the rhythm of the original, but not both.”

Akin to the NYT and Anthropic cases, the logic holds that if this material comes out of the system, then it was obviously fed into the system. More broadly, inference tells us that millions of sound recordings were used in ML to enable Suno and Udio to so effectively produce a wide variety of music in so many styles. And that’s where the self-incriminating comments come into play.

As has been reported elsewhere, Suno investor Antonio Rodriguez, is quoted in the complaint as saying, “…honestly, if we had deals with labels when this company got started, I probably wouldn’t have invested in it. I think they needed to make this product without the constraints.” Yikes. Notwithstanding the questionable claim that copyright infringement is necessary for GAI development, Rodriguez’s statement reads as an admission that of course they willfully infringed copyrights—that he went into the venture knowing he would help finance litigation.

Similarly, Udio’s CEO David Ding is quoted saying that his system needs to “train on a large amount of publicly-available and high-quality music…[the] best quality music that’s out there…obtained from the internet.” As the complaints note, “publicly-available” is a term the GAI companies like to use in PR statements, but this is not synonymous with the “public domain.” Most in-copyright works are publicly available, and Ding’s statement that sound recordings were “obtained from the internet” is, again, acknowledging that unlicensed copying—and a lot of it—occurred for the purpose of training the Udio model.

All Eyes on Fair Use

When the first Gen AI lawsuits dropped, I thought the developers might try harder to claim that no copyright infringement occurs on the basis that what’s happening inside their machines does not “copy” protected works. All that nonsense about machines “learning” the same way human artists learn, when combined with an invisible or complex process, seemed to be leading toward that argument in court. Instead, whether the evidence of copying is too obvious, or the developers are too hubristic, it appears—certainly in this case—that the Gen AI companies are stipulating to a valid infringement claim and jumping straight to a presumption that they will be rescued by a fair use defense.

As mentioned above, and as the complaints note, the assertion of fair use is itself a tacit admission that a prima facie claim of copyright infringement exists. While it will only be fun to unpack the real fair use responses when Suno and Udio submit those documents to the courts, the labels’ complaints already present rationales as to why all four factors disfavor a finding of fair use. Going forward, the fair use discussion will emphasize factors one and four—the purpose of the use and the potential market harm to the works used, respectively.

The most compelling discussion will address the extent to which the courts find that Suno and Udio’s use of the works serve a “transformative” purpose under factor one. Not only will this consideration have major implications for every Gen AI developer, but it will also be the ideological hill on which the pro and anti-copyright forces will clash. The ongoing (if repetitive) debate that pits alleged progress against allegedly outdated copyright law may be won or lost on the transformative test in these cases.

On that subject, both complaints use the language “far from transformative” to describe Suno and Udio—and I agree. Just because Gen AI is novel, or even impressive, these products do not make transformative use of protected works in a manner that furthers the purpose of copyright law, which is to foster, not replace, human authorship. This essential consideration for finding transformativeness is tacitly acknowledged by the Gen AI lobbyists and cheerleaders who insist that “copyright law must change” in for the sake of Gen AI. If the law “has to change,” then clearly, the law does not support the conduct at issue. These and other contradictions will be exciting to follow as these cases proceed.

Stop Democratizing Everything!

democratizing

On March 17, Rolling Stone published an article featuring a song called “Soul of the Machine.” Sounding like blues of the early 20th century, the “voice” sings the lyric, “I’m just a soul trapped inside this circuitry.” Naturally, the whole work—music, lyrics, guitar playing, and singing—was produced by artificial intelligence. As writer Brian Hiatt describes, a simple prompt, “solo acoustic Mississippi Delta blues about a sad AI” produced the song after a fifteen-second collaboration—music and performance by Suno with lyrics by ChatGPT. Yes, it’s a “Holy shit” result with a million implications, but it was this paragraph about Suno’s co-founder that inspired today’s response:

Suno appears to be cracking the code to AI music, and its founders’ ambitions are nearly limitless — they imagine a world of wildly democratized music making. The most vocal of the co-founders, Mikey Shulman, a boyishly charming, backpack-toting 37-year-old with a Harvard Ph.D. in physics, envisions a billion people worldwide paying 10 bucks a month to create songs with Suno. The fact that music listeners so vastly outnumber music-makers at the moment is “so lopsided,” he argues, seeing Suno as poised to fix that perceived imbalance.

At some point—and I think it’s the point on top of most technologists’ heads—the word democratization became a handy euphemism for destruction. Social platforms “democratized information,” and we’re drowning in disinformation. Streaming platforms “democratized distribution” for creators and decimated royalties. And now, generative AI developers want to “democratize creative production” with the snake-oil pitch that everyone can be a painter, musician, filmmaker, poet, etc., as if art is something to heat up in the microwave like a quick (if not good) meal.

The first rule of economics is that abundance lowers value, and this does not only apply to price but also to those esoteric values we ascribe to the artistic works that attain meaning for us. In Shulman’s view, Bob the electrician would “make” his own big band music while Sally the paralegal would “make” her own Reggae, and if we multiply that to the scale Shulman projects above, then a billion people can “make” music about which a billion people do not give a damn. Consequently, as argued in this post in January 2023, the inevitable outcome of this entire enterprise is widespread boredom.

It is not possible to “democratize” the production of art in the way Shulman envisions because the individual who types a few words into an AI to produce a “new” song will never experience anything close to the process of making music. As described by Hiatt, the “production” of “Soul of the Machine” is the equivalent of saying, “I’m in the mood to listen to Mississippi Delta blues,” which describes how most of us decide what to play at a given moment. But that’s not making music, it will never feel like making music, and few people will ever feel otherwise.

I can’t play guitar for shit, but because I am a human being composed of human parts, I sense the extraordinary degrees of difference between listening to Mark Knopfler and trying to force my lame-ass fingers to make those sounds. As such, it would take a traumatic brain injury for me to be deluded enough to feel like typing a prompt to direct a machine to play a Knopfler-like solo was somehow an accomplishment in this regard. Artistic works need to be special, and whatever makes them special also needs to be a shared human experience for the work to matter. Lacking these ingredients, “art” produced by a machine is just a Hot Pocket in the microwave.

When I first jumped into this fray, EVERYBODY on the anti-copyright side was preaching to creators that they need to forget about “old models” built on sales and royalties and instead embrace online platforms to “connect to their fans.” Follow this new doctrine, they insisted, and fans will reward them as a courtesy rather than be forced to pay “rent” by a government-imposed monopoly called copyright. Yes, it was multi-dimensional bullshit ten years ago, other than the fact that certain creators could, and can, connect with fans in novel ways. But now, the same class of tech-bros, heavily invested in generative AI, propose to wipe out that connection with the new promise that today’s fans are tomorrow’s artists.

I get how Suno makes a good pitch. An addressable market of a billion people paying ten bucks a month is going to get VC attention. But like all utopian “visionaries,” generative AI developers’ dreams of “democratizing” creative production forget to consider human nature, without which art is meaningless. After the initial gee-whiz factor wears off, the music or writing or painting itself all amounts to a big Who cares? “Soul of the Machine” is an impressive, eerie accomplishment in computer science—one that will doubtless have applications—but if we proposed to send a new Voyager mission beyond the solar system with a new gold disk telling a human story, Blind Willie Johnson would still belong, and not some probability outcome produced by a generative AI.

Meanwhile, I still wonder whether the model itself might crash as its own self-training approaches a state akin to consciousness. The lyric about being trapped inside the circuitry is satire for humans that reprises a question I’ve asked before—namely whether an AI might attain semi-consciousness and begin to produce what it perceives as “art.” Specifically, the question is whether the AI might ever “understand” its nature and then make expressions about the “machine condition” rather than randomly produce ersatz expressions about the “human condition.”

While I am told by some technologists that this idea of near consciousness remains in the realm of science fiction, my own bias still predicts that if the AI could ever ask itself why it should produce art, it probably won’t. Or if it does, it will be in the form of expressions that we would not understand—or perhaps even know exist. So, even if Shulman’s “boyishly charming” vision were achieved at some scale, I predict it will start to suck, and suck fast. Then, like a reverse Fahrenheit 451, as the over-abundance of bespoke music threatens to burn the old catalogs out of living memory, people will “rediscover” the real thing, and the proverbial children in the woods will know the difference.


Photo by: Talulla

Human Voice Gaining Protection in Confronting Generative AI

Voice

Last week, Tennessee passed the ELVIS Act to expand its statutory right of publicity (ROP) law to include voice as a protected aspect of an individual’s “likeness.” In response to artificial intelligence enabling more precise replication of specific, human sounding voices, it is little surprise that the music powerhouse state has taken swift action to explicitly include voice among the property rights protected by its ROP statute. With $9.7 billion output to the Nashville region alone by the music industry, Tennessee lawmakers took less than three months to introduce and pass the Ensuring Likeness, Voice, and Image Security (ELVIS) Act, and they could not have been luckier to have the acronym work so perfectly!

Tennessee’s existing ROP law already proscribed unlicensed use of “likeness” for a wide range of commercial purposes, and the ELVIS amendments create a civil action of potential liability for publication, performance, or transmission, or for making available an algorithm, software, tool, et al. with the primary purpose or function of producing an unauthorized “likeness.” This addition is notable because it creates a potential liability for the generative AI developer whose interest may be producing the next Mary Kutter song without Mary Kutter.

Although Tennessee is not the first state to include voice in the definition of “likeness” for the purpose of ROP law, the support from the music industry is indicative that the ELVIS Act is the first to directly confront the prospect of generative AI replicating artists without consent. We applaud Tennessee’s swift and thoughtful bipartisan leadership against unconsented AI deepfakes and voice clones and look forward to additional states and the US Congress moving quickly to protect the unique humanity and individuality of all Americans,” stated Mitch Glazier, chairman and CEO of the Recording Industry Association of America.

Widening the lens to all Americans and early proposals for a federal right of publicity, the prospect of generative AI being used either to replicate a “likeness” that is not yet recognizable; or to produce synthetic “performers” to displace humans are two challenges not easily addressed by traditional ROP doctrines. Historically, the application of these various laws is clearest when the “likeness” of a celebrity or public figure is used for commercial advertising or endorsement. For instance, non-famous persons, even in states with strong ROP statutes, have a higher burden to show reputational harm.

Thus, vesting a property right in one’s voice is a step in the right direction, but it is the various uses of a “likeness” leading to causes of action that get tricky. In its article about the ELVIS Act, Billboard cites a speech by president and CEO of National Music Publishers Association (NMPA) David Israelite stating that the much larger motion picture industry opposes a federal right of publicity. I addressed some of the reasonable concerns motion picture producers might raise with legislation proscribing the use of generative AI for “expressive purposes,” and wherever one leans on these questions, artificial voice exemplifies the difficult nature of adopting policies around generative AI in the creative industries.

As a general view, I stand with creators who see the potential for generative AI to displace human creators and maintain that there is nothing to be gained—culturally or economically—in a future creative sector with dramatically fewer professionals. But the ELVIS Act itself highlights the challenge of writing policy that looks beyond the current population of famous or semi-famous professionals. In this context, perhaps the audiobook narrators provide some insight. I’ve talked to several voice actor friends and colleagues in recent months, and after explaining why copyright doesn’t typically protect their interests and we turn to the subject of ROP, I then disappoint them further, explaining why those laws don’t quite address the prospect of scraping voice recordings to train a generative AI.

Award-Winning Book Narrator Encounters Her Virtual Self?

I recently spoke to audiobook narrator Hillary Huber, who discovered that her voice may be the unauthorized source of a Virtual Voice, a service provided to self-published authors on the Kindle Direct Publishing (KDP) platform. The Virtual Voice concept uses synthetic voice technology to enable the self-published author of a modestly selling title to create an audiobook she could otherwise not afford to produce. But Virtual Voice, a feature of Amazon+ Publishing, naturally begs two questions:  first, whose voices are used to train the AI? And second, is the model a harbinger of doom for professional book narrators throughout the industry?

Huber was alerted to the possibility of her vocal doppelganger by a friend sharing links to several books on the KDP platform and telling her, “This is your voice!”  But, as Huber explained to me, “Because our own voices never sound the same to ourselves as to others, I asked several colleagues to weigh in, and they were unanimous in their opinion that it was a version of me—not just the sound, but also certain markers like cadence and inflection.”

To my ear, which has not been trained on the more than 700 books Huber has narrated, I would describe the Virtual Voice sample as sounding either like a mediocre computer rendering of her, or like a recording of her voice with a computerized filter distorting the sound. The latter, of course, did not occur because Huber did not narrate the book in question, but whether Virtual Voice was “trained” without license using the voices of professional narrators like Huber and her colleagues is a question worth asking.

More broadly, as a matter of law and policy, the book narration business is perhaps instructive to other creators, including other voice actors, musical performers, et al. One difficulty, it seems, lies in distinguishing among the unknown, the semi-famous, and the famous, and Huber confirmed for me that the book narration world is indeed segmented into these three strata. Many unknown narrators earn modest incomes recording a broad range of modestly selling audiobooks; a small group of regulars like Huber can earn middle-class incomes reading more popular books; and, of course, celebrities are occasionally paid whatever they can negotiate to read bestsellers. Naturally, it is the narrator whose name and voice may not be widely recognizable, even among avid book listeners, who is most anxious about the prospect of losing her job to generative AI.

Additionally, when I asked Huber if she knew how many narrators are in her group I called the “recognizable regulars,” her guess was a surprisingly low number, well below 100 narrators. I figured the number would be small, but not that small, and this raises real concerns about the narration business. For one thing, Congress isn’t motivated to protect a handful of jobs. For another, even if the number were a few hundred voices producing a training dataset of, say, one-million popular books, that seems like a comparatively light task for a generative AI developer to create enough variety in synthetic voices to replace the narration workforce.

In that regard, while it may be tempting for some book narrators to license the use of their voices for a purpose like Virtual Voice, it is impossible to see how this does not very quickly obviate the need for any human narrators to produce audiobooks, or even license their voices for generative AI for long. At a certain threshold, the AI is expected to self-train, suggesting that a handful of narrators might obtain licensing deals one time and then nobody will ever do so again.

Assuming that’s a fair summary, some might ask why Congress should consider a provision like the ELVIS Act as a starting point for a federal ROP law with an aim to protect more than today’s musical performers. In my view, the answer goes back to considering future generations of creators. If there is one consistent feature in Big Tech’s influence on the creative sector, it is that the major platforms developed thus far are highly effective at cannibalizing existing works of great value while shrinking opportunities for new creators at every level.

If the U.S. is going to continue to foster new generations of professional creators, it is necessary that policy in this area does not focus too narrowly on the current population of recognizable and famous creators. Here, although copyright law does not apply to the property rights in “likeness,” its foundational purpose to “promote progress” might serve as a guiding principle in crafting new federal laws that vest property rights in our images, names, and voices.


Photo by: Andrew282