Human Voice Gaining Protection in Confronting Generative AI

Last week, Tennessee passed the ELVIS Act to expand its statutory right of publicity (ROP) law to include voice as a protected aspect of an individual’s “likeness.” In response to artificial intelligence enabling more precise replication of specific, human sounding voices, it is little surprise that the music powerhouse state has taken swift action to explicitly include voice among the property rights protected by its ROP statute. With $9.7 billion output to the Nashville region alone by the music industry, Tennessee lawmakers took less than three months to introduce and pass the Ensuring Likeness, Voice, and Image Security (ELVIS) Act, and they could not have been luckier to have the acronym work so perfectly!

Tennessee’s existing ROP law already proscribed unlicensed use of “likeness” for a wide range of commercial purposes, and the ELVIS amendments create a civil action of potential liability for publication, performance, or transmission, or for making available an algorithm, software, tool, et al. with the primary purpose or function of producing an unauthorized “likeness.” This addition is notable because it creates a potential liability for the generative AI developer whose interest may be producing the next Mary Kutter song without Mary Kutter.

Although Tennessee is not the first state to include voice in the definition of “likeness” for the purpose of ROP law, the support from the music industry is indicative that the ELVIS Act is the first to directly confront the prospect of generative AI replicating artists without consent. “We applaud Tennessee’s swift and thoughtful bipartisan leadership against unconsented AI deepfakes and voice clones and look forward to additional states and the US Congress moving quickly to protect the unique humanity and individuality of all Americans,” stated Mitch Glazier, chairman and CEO of the Recording Industry Association of America.

Widening the lens to all Americans and early proposals for a federal right of publicity, the prospect of generative AI being used either to replicate a “likeness” that is not yet recognizable; or to produce synthetic “performers” to displace humans are two challenges not easily addressed by traditional ROP doctrines. Historically, the application of these various laws is clearest when the “likeness” of a celebrity or public figure is used for commercial advertising or endorsement. For instance, non-famous persons, even in states with strong ROP statutes, have a higher burden to show reputational harm.

Thus, vesting a property right in one’s voice is a step in the right direction, but it is the various uses of a “likeness” leading to causes of action that get tricky. In its article about the ELVIS Act, Billboard cites a speech by president and CEO of National Music Publishers Association (NMPA) David Israelite stating that the much larger motion picture industry opposes a federal right of publicity. I addressed some of the reasonable concerns motion picture producers might raise with legislation proscribing the use of generative AI for “expressive purposes,” and wherever one leans on these questions, artificial voice exemplifies the difficult nature of adopting policies around generative AI in the creative industries.

As a general view, I stand with creators who see the potential for generative AI to displace human creators and maintain that there is nothing to be gained—culturally or economically—in a future creative sector with dramatically fewer professionals. But the ELVIS Act itself highlights the challenge of writing policy that looks beyond the current population of famous or semi-famous professionals. In this context, perhaps the audiobook narrators provide some insight. I’ve talked to several voice actor friends and colleagues in recent months, and after explaining why copyright doesn’t typically protect their interests and we turn to the subject of ROP, I then disappoint them further, explaining why those laws don’t quite address the prospect of scraping voice recordings to train a generative AI.

Award-Winning Book Narrator Encounters Her Virtual Self?

I recently spoke to audiobook narrator Hillary Huber, who discovered that her voice may be the unauthorized source of a Virtual Voice, a service provided to self-published authors on the Kindle Direct Publishing (KDP) platform. The Virtual Voice concept uses synthetic voice technology to enable the self-published author of a modestly selling title to create an audiobook she could otherwise not afford to produce. But Virtual Voice, a feature of Amazon+ Publishing, naturally begs two questions: first, whose voices are used to train the AI? And second, is the model a harbinger of doom for professional book narrators throughout the industry?

Huber was alerted to the possibility of her vocal doppelganger by a friend sharing links to several books on the KDP platform and telling her, “This is your voice!” But, as Huber explained to me, “Because our own voices never sound the same to ourselves as to others, I asked several colleagues to weigh in, and they were unanimous in their opinion that it was a version of me—not just the sound, but also certain markers like cadence and inflection.”

To my ear, which has not been trained on the more than 700 books Huber has narrated, I would describe the Virtual Voice sample as sounding either like a mediocre computer rendering of her, or like a recording of her voice with a computerized filter distorting the sound. The latter, of course, did not occur because Huber did not narrate the book in question, but whether Virtual Voice was “trained” without license using the voices of professional narrators like Huber and her colleagues is a question worth asking.

More broadly, as a matter of law and policy, the book narration business is perhaps instructive to other creators, including other voice actors, musical performers, et al. One difficulty, it seems, lies in distinguishing among the unknown, the semi-famous, and the famous, and Huber confirmed for me that the book narration world is indeed segmented into these three strata. Many unknown narrators earn modest incomes recording a broad range of modestly selling audiobooks; a small group of regulars like Huber can earn middle-class incomes reading more popular books; and, of course, celebrities are occasionally paid whatever they can negotiate to read bestsellers. Naturally, it is the narrator whose name and voice may not be widely recognizable, even among avid book listeners, who is most anxious about the prospect of losing her job to generative AI.

Additionally, when I asked Huber if she knew how many narrators are in her group I called the “recognizable regulars,” her guess was a surprisingly low number, well below 100 narrators. I figured the number would be small, but not that small, and this raises real concerns about the narration business. For one thing, Congress isn’t motivated to protect a handful of jobs. For another, even if the number were a few hundred voices producing a training dataset of, say, one-million popular books, that seems like a comparatively light task for a generative AI developer to create enough variety in synthetic voices to replace the narration workforce.

In that regard, while it may be tempting for some book narrators to license the use of their voices for a purpose like Virtual Voice, it is impossible to see how this does not very quickly obviate the need for any human narrators to produce audiobooks, or even license their voices for generative AI for long. At a certain threshold, the AI is expected to self-train, suggesting that a handful of narrators might obtain licensing deals one time and then nobody will ever do so again.

Assuming that’s a fair summary, some might ask why Congress should consider a provision like the ELVIS Act as a starting point for a federal ROP law with an aim to protect more than today’s musical performers. In my view, the answer goes back to considering future generations of creators. If there is one consistent feature in Big Tech’s influence on the creative sector, it is that the major platforms developed thus far are highly effective at cannibalizing existing works of great value while shrinking opportunities for new creators at every level.

If the U.S. is going to continue to foster new generations of professional creators, it is necessary that policy in this area does not focus too narrowly on the current population of recognizable and famous creators. Here, although copyright law does not apply to the property rights in “likeness,” its foundational purpose to “promote progress” might serve as a guiding principle in crafting new federal laws that vest property rights in our images, names, and voices.

Photo by: Andrew282

Recent AI Copyright Lawsuits Are About More than Compensation for Authors

Last week, writer and broadcaster Andrew Keen invited me to his podcast Keen On to talk (of course) about artificial intelligence. When we got to the subject of the New York Times lawsuit against Open AI and Microsoft, I noted that 1) it is arguably the strongest copyright case presented to date against an AI developer; 2) that it would likely result in a substantial licensing deal between the parties; and 3) that it is hard to say what any of this means for journalism going forward. On that same subject, nonfiction authors Nicholas Basbanes and Nicholas Gage filed a class action suit against Open AI and Microsoft on January 5, just over a week after the Times suit was filed.

As discussed in other posts, although generative AI unequivocally poses a threat to authors and authorship, U.S. copyright law is, oddly enough, not quite designed to address the full scope of the social, economic, and cultural challenge of that threat. While this seems counterintuitive, the difficulty lies in the fact that copyright promotes authorship by protecting works against specific means of infringement, and the nail-biting question of the moment is whether “machine learning” (ML) with the use of protected works violates the reproduction right (§106(1)) of the Copyright Act.

Here, the Times case is strong because the news organization presents compelling, side-by-side evidence that its published stories are being output by ChatGPT almost verbatim. This is evidence that not only is reproduction occurring in the AI model, but that the outputs provided to users serve as a substitute for legal access to the Times’s material. The evidence of reproduction establishes a solid claim of infringement, while the evidence of substitution goes against Open AI’s putative fair use defense. In fact, it was the same circuit (the Second) which held that a news service called TVEyes was “slightly transformative” but that it made so much of Fox News’s material available, even in segments, that the substitutional purpose doomed its fair use defense.

Unlike the Times, the nonfiction book authors do not present side-by-side evidence of verbatim copying of their published writings, and this is consistent with some of the other class-action suits. These are the real nail-biter cases, in my view, because the plaintiffs’ cause is just, but their proof of copyright infringement is less demonstrable than the Times (or the Concord v. Anthropic case for that matter). But this focus on both The New York Times and nonfiction authors raises a serious question as to whether AI will exacerbate the already dismal state of information in the information age.

When the early work of this blog started in 2011, one of the issues of concern was the volume of mediocre, careless, or inaccurate reporting and commentary being promulgated under brands normally associated with quality journalism. Here, it must be said that the Gray Lady herself has not always been immune to the digital-age forces of volume and speed that can drive reporters and editors to engage the market on the lowest rungs. But if the stodgy algorithms of social media have animated a new era of yellow journalism, isn’t it reasonable to assume that certain generative AIs will make matters worse? The internet has already fostered more misinformation than a democratic society can safely endure.

If we consider the possible outcomes of the Times lawsuit, one would be that Open AI changes the model to avoid infringing reproduction. While this may satisfy from a copyright perspective, one wonders about the quality and/or purpose of the information being provided by a tool like ChatGPT. The output of an LLM is the result of probability. The user asks a question (a prompt), and the AI responds that in all likelihood, based on the information fed into an algorithm, this is what you want to know.

It is no wonder the system to date reproduces material verbatim from a major news organization, but if it doesn’t do that, what should it do? Or what can it do that can be called “progress” with regard to news and information? Take a multi-faceted, extremely emotional topic like Israel and Palestine, train an AI on all the solid reporting, all the mediocre editorials, and the cacophony of opinions on social media, and the user of the LLM gets…what? Why would the results be more informative or thoughtful than the veteran journalist doing her best?

Why won’t an AI be worse than “recommendation algorithms?” If YouTube and Facebook foster confirmation bias and shepherd people onto the wild grazing fields of organically grown conspiracies, it seems rational and prudent to assume that an LLM will do the same thing more efficiently. Why have an old-school search engine point you toward a bogus article linking vaccines to autism when you can have a “dialogue” with an ersatz intelligence on the same topic?

Although the nonfiction book authors do not present the kind of evidence of copyright infringement the Times exhibits in its complaint, the facts presented about the authors’ investment of time, expertise, and money makes a point that should be read as more than a mere plea for sympathy. This is not just about job loss for future historians but quite possibly about the loss of history itself. From the Basbanes et al. complaint:

The archive of primary research materials assembled by Mr. Basbanes in support of his work over a period of forty years, when acquired by Texas A&M University in 2015, filled 365 packing boxes with documents, transcriptions, drafts, field notebooks, photographic negatives, and the like, all acquired by Mr. Basbanes in pursuit of his literary activities, and at his expense and initiative.

It is more than a legal (i.e., fair use) question whether the purpose of a model like ChatGPT is to make new and relevant use of all that work, or whether its purpose is to supplant the historian and the reporter by “feeding off the sere remains of the past,”[1] until it eventually starves. In the former case, licensing and collaborating with authors and journalists seems reasonable, in the latter case, allowing certain generative AIs to die on the vine seems imperative.

[1] From Ralph Waldo Emerson’s speech at Harvard calling for an American literary independence, August 31, 1837.

Photo by: Antonio83

Generative AI is a lot Like a Video Tape Recorder, No?

In my last post, I focused on the hypothetical fair use defense of generative AI under the principles articulated in the Google Books decision of 2014. In this post, I want to address another claim that has arisen—both on social media, and in comments to the Copyright Office—namely that generative AI companies should be shielded against secondary liability for copyright infringement under the “Sony Safe Harbor.”

This refers to the 1984 Supreme Court decision in Sony v. Universal (The “Sony Betamax” Case), holding that the video tape recorder (VTR) is legal based on two interrelated findings: 1) the fair use opinion that consumers had a right to “time-shift” the viewing of televised material; and 2) therefore, the VTR would be used for substantially non-infringing purposes. Thus, although some parties would inevitably use the VTR for infringing purposes, Sony Corporation could not be liable for contributory infringement in such instances.

Clearly, there are some bright, shining distinctions between the VTR and a generative AI. The VTR was not designed by inputting millions of AV works into a computer model, and its purpose was not to generate “new” AV works. Instead, those obsolete machines performed two very basic functions: they made videotape copies of AV material, and they displayed copies of AV material for a specific type of personal use.[1] As noted in the post about Google Books, the Court in Sony also had a fully developed product and a clearly defined purpose in the VTR. And again, this is not so with respect to understanding the purpose of a given generative AI.

I believe the novelty (and even the uncertainty) of the AIs purpose is fatal to the argument that generative AI companies are necessarily shielded by the “Sony Safe Harbor.” This is because in Sony, the anticipation of substantially non-infringing use rests on the novel “time-shifting” notion introduced into the fact-intensive fair use finding. In other words, “time-shifting” was a principle specific to the technology at issue, and no analogous concept lurks anywhere in the purpose of a given AI, let alone all AIs still in development. Imagine if Sony Corp. walked into court with a box of assembled electronic parts, declared that they’re not quite sure what the box can or will do yet (though it might distribute homemade copies into the market!), but they would really like a fair use decision and liability ruling in their favor.

Non-Infringing Use Under Different Rationales

To be clear, it is plausible—even reasonable—to expect that the majority of outputs by a generative AI are, or will be, non-infringing. In fact, I believe this is one of the pitfalls when it comes to hoping that copyright can address the presumed threat of AI outputs: because the substantial similarity bar, finding that Work A infringes Work B, is thrown into a doctrinal tailspin. For example, when a person knowingly copies a work, this fosters a strong claim of infringement, but independent creation is a non-infringing act. And then, there are shades in between willful infringement, innocent infringement, and non-infringement, depending on the facts of a particular case.

In addition to copyright’s limiting doctrines, which allow myriad “similar” works to coexist without legal conflict, I predict that generative AI has the potential to warp the evidentiary foundations necessary to a substantial similarity test to prove infringement. If that is correct, it may be one rationale for predicting widespread non-infringing use, but it is highly distinguishable from the foundations for the “Sony Safe Harbor.” Meanwhile, the consideration of secondary liability (as with fair use) depends substantially on the purpose of the technology at issue—and that purpose remains unclear.

The mundane, mechanical VTR only potentially threatened the “making available” rights for works produced and owned by creators. This is not remotely comparable to a computer model “trained” with millions of protected works for the purpose of enabling that computer model to produce new “works.” To paraphrase my brief comments to the Copyright Office, if a particular work goes into the machine and a potentially infringing copy of that work comes out of the machine, I do not believe there is any authority which broadly shields the developer from liability.

With that example in mind, though, it is worth noting that a code-based service, unlike a physical electronic device, can be revised concurrent with delivery to the market. Thus, unlike Sony and its Betamax, the AI developer looking to limit liability for copyright infringement has the opportunity (dare we say obligation?) to make every effort to design and continually update a system to avoid copyright infringement. This may entail licensing materials used to “train” a generative AI and/or ongoing tweaking of the algorithm to avoid infringing outputs. Either way, if the developers don’t want to build these kind of safeguards for the most revolutionary tech of 2023, surely they cannot be allowed to hide behind a liability shield established in 1984 for a box now collecting dust in the attic.

[1] They also frustrated many consumers who tried to set the clocks, but that’s another matter.

Photo by: Tamer_Soliman

The Illusion of More

Dissecting the digital utopia.

Category: Artificial Intelligence (AI)

Human Voice Gaining Protection in Confronting Generative AI

Award-Winning Book Narrator Encounters Her Virtual Self?

Recent AI Copyright Lawsuits Are About More than Compensation for Authors

Generative AI is a lot Like a Video Tape Recorder, No?

Non-Infringing Use Under Different Rationales

Archives

Browse Topics