Human Voice Gaining Protection in Confronting Generative AI

Last week, Tennessee passed the ELVIS Act to expand its statutory right of publicity (ROP) law to include voice as a protected aspect of an individual’s “likeness.” In response to artificial intelligence enabling more precise replication of specific, human sounding voices, it is little surprise that the music powerhouse state has taken swift action to explicitly include voice among the property rights protected by its ROP statute. With $9.7 billion output to the Nashville region alone by the music industry, Tennessee lawmakers took less than three months to introduce and pass the Ensuring Likeness, Voice, and Image Security (ELVIS) Act, and they could not have been luckier to have the acronym work so perfectly!

Tennessee’s existing ROP law already proscribed unlicensed use of “likeness” for a wide range of commercial purposes, and the ELVIS amendments create a civil action of potential liability for publication, performance, or transmission, or for making available an algorithm, software, tool, et al. with the primary purpose or function of producing an unauthorized “likeness.” This addition is notable because it creates a potential liability for the generative AI developer whose interest may be producing the next Mary Kutter song without Mary Kutter.

Although Tennessee is not the first state to include voice in the definition of “likeness” for the purpose of ROP law, the support from the music industry is indicative that the ELVIS Act is the first to directly confront the prospect of generative AI replicating artists without consent. “We applaud Tennessee’s swift and thoughtful bipartisan leadership against unconsented AI deepfakes and voice clones and look forward to additional states and the US Congress moving quickly to protect the unique humanity and individuality of all Americans,” stated Mitch Glazier, chairman and CEO of the Recording Industry Association of America.

Widening the lens to all Americans and early proposals for a federal right of publicity, the prospect of generative AI being used either to replicate a “likeness” that is not yet recognizable; or to produce synthetic “performers” to displace humans are two challenges not easily addressed by traditional ROP doctrines. Historically, the application of these various laws is clearest when the “likeness” of a celebrity or public figure is used for commercial advertising or endorsement. For instance, non-famous persons, even in states with strong ROP statutes, have a higher burden to show reputational harm.

Thus, vesting a property right in one’s voice is a step in the right direction, but it is the various uses of a “likeness” leading to causes of action that get tricky. In its article about the ELVIS Act, Billboard cites a speech by president and CEO of National Music Publishers Association (NMPA) David Israelite stating that the much larger motion picture industry opposes a federal right of publicity. I addressed some of the reasonable concerns motion picture producers might raise with legislation proscribing the use of generative AI for “expressive purposes,” and wherever one leans on these questions, artificial voice exemplifies the difficult nature of adopting policies around generative AI in the creative industries.

As a general view, I stand with creators who see the potential for generative AI to displace human creators and maintain that there is nothing to be gained—culturally or economically—in a future creative sector with dramatically fewer professionals. But the ELVIS Act itself highlights the challenge of writing policy that looks beyond the current population of famous or semi-famous professionals. In this context, perhaps the audiobook narrators provide some insight. I’ve talked to several voice actor friends and colleagues in recent months, and after explaining why copyright doesn’t typically protect their interests and we turn to the subject of ROP, I then disappoint them further, explaining why those laws don’t quite address the prospect of scraping voice recordings to train a generative AI.

Award-Winning Book Narrator Encounters Her Virtual Self?

I recently spoke to audiobook narrator Hillary Huber, who discovered that her voice may be the unauthorized source of a Virtual Voice, a service provided to self-published authors on the Kindle Direct Publishing (KDP) platform. The Virtual Voice concept uses synthetic voice technology to enable the self-published author of a modestly selling title to create an audiobook she could otherwise not afford to produce. But Virtual Voice, a feature of Amazon+ Publishing, naturally begs two questions: first, whose voices are used to train the AI? And second, is the model a harbinger of doom for professional book narrators throughout the industry?

Huber was alerted to the possibility of her vocal doppelganger by a friend sharing links to several books on the KDP platform and telling her, “This is your voice!” But, as Huber explained to me, “Because our own voices never sound the same to ourselves as to others, I asked several colleagues to weigh in, and they were unanimous in their opinion that it was a version of me—not just the sound, but also certain markers like cadence and inflection.”

To my ear, which has not been trained on the more than 700 books Huber has narrated, I would describe the Virtual Voice sample as sounding either like a mediocre computer rendering of her, or like a recording of her voice with a computerized filter distorting the sound. The latter, of course, did not occur because Huber did not narrate the book in question, but whether Virtual Voice was “trained” without license using the voices of professional narrators like Huber and her colleagues is a question worth asking.

More broadly, as a matter of law and policy, the book narration business is perhaps instructive to other creators, including other voice actors, musical performers, et al. One difficulty, it seems, lies in distinguishing among the unknown, the semi-famous, and the famous, and Huber confirmed for me that the book narration world is indeed segmented into these three strata. Many unknown narrators earn modest incomes recording a broad range of modestly selling audiobooks; a small group of regulars like Huber can earn middle-class incomes reading more popular books; and, of course, celebrities are occasionally paid whatever they can negotiate to read bestsellers. Naturally, it is the narrator whose name and voice may not be widely recognizable, even among avid book listeners, who is most anxious about the prospect of losing her job to generative AI.

Additionally, when I asked Huber if she knew how many narrators are in her group I called the “recognizable regulars,” her guess was a surprisingly low number, well below 100 narrators. I figured the number would be small, but not that small, and this raises real concerns about the narration business. For one thing, Congress isn’t motivated to protect a handful of jobs. For another, even if the number were a few hundred voices producing a training dataset of, say, one-million popular books, that seems like a comparatively light task for a generative AI developer to create enough variety in synthetic voices to replace the narration workforce.

In that regard, while it may be tempting for some book narrators to license the use of their voices for a purpose like Virtual Voice, it is impossible to see how this does not very quickly obviate the need for any human narrators to produce audiobooks, or even license their voices for generative AI for long. At a certain threshold, the AI is expected to self-train, suggesting that a handful of narrators might obtain licensing deals one time and then nobody will ever do so again.

Assuming that’s a fair summary, some might ask why Congress should consider a provision like the ELVIS Act as a starting point for a federal ROP law with an aim to protect more than today’s musical performers. In my view, the answer goes back to considering future generations of creators. If there is one consistent feature in Big Tech’s influence on the creative sector, it is that the major platforms developed thus far are highly effective at cannibalizing existing works of great value while shrinking opportunities for new creators at every level.

If the U.S. is going to continue to foster new generations of professional creators, it is necessary that policy in this area does not focus too narrowly on the current population of recognizable and famous creators. Here, although copyright law does not apply to the property rights in “likeness,” its foundational purpose to “promote progress” might serve as a guiding principle in crafting new federal laws that vest property rights in our images, names, and voices.

Photo by: Andrew282

In Suit With Publishers, Audible’s Defenses Raise Questions

Last Monday, the world’s largest distributor of audiobooks, Audible, had intended to launch a new service called Caption, a feature that uses voice-to-text transcription technology to display the text of an audiobook on a user’s screen in synch with the narration. In late August, seven major publishers* filed suit against Audible, alleging that the unlicensed Caption feature amounts to copyright infringement of the underlying literary works. The Publishers requested a preliminary injunction to prevent Audible from launching Caption pending further proceedings.

According to Audible, the customer who wants to use Caption would request a transcription of the audiobook, which is then made available about thirty minutes after the request. The customer is then able to read the book in caption form (no more than 15-20 words at a time) while listening to the narration, and he can also tap on selected words to link to dictionary or Wikipedia references. The captions generated are imperfect (94% accurate), not unlike the syntactical or spelling flaws one sees in closed captioning on television.

Audible states that it intends to store a requested transcript for a period of 90 days on its servers, and if no other requests for the same transcript are made in that timeframe, the file will be deleted. All this transcribing, deleting, and re-transcribing looks a lot like a wasted effort designed primarily to circumvent a claim of direct copyright infringement, but perhaps more on that detail in a future post.

For now, if Caption sounds generally like a useful “enhancement” to the audiobook experience, this is more or less the perception Audible is counting on in its response to the lawsuit filed on September 12. The company’s brief states that Caption “was created to encourage deeper and better understanding of audiobooks for users who have chosen to have an audio-first experience.” More particularly, Audible places considerable emphasis on “struggling readers;” and although the potential educational value of Caption is not entirely dismissible, Audible has no intention of restricting its roll-out to students, or any identifiable “struggling” class of readers. It hopes to offer Caption with nearly every book in its library, except those works the transcription software would be unable to render with 90+% accuracy. Finnegan’s Wake?

Because Audible is a subsidiary of Amazon, and Amazon is one of the world’s most predatory companies on Earth, the courts, book authors, and the public should take a jaundiced—if not outright skeptical—view of Audible’s implication that its primary motive is to improve reading and literacy. That ambition may be central to Audible’s founding, but Papa Amazon has a rather dismal track record for supporting the interests or rights of any individuals in its relentless pursuit of global distribution dominance.

The Lawsuit

Simply put, the Publishers’ make clear that they licensed their audiobooks to Audible for distribution only and, therefore, the Caption feature amounts to an unlicensed, distributed-text version of a book. Not only do the Publishers predict Caption may become a substitute for an eBook, they further note that Caption may quickly displace existing, legal technologies like Immersion Reading and Whispersynch, both of which enable users to link eBooks to audiobooks so that the words in the former are highlighted for reading along with the narration in the latter.

In its defense brief, Audible responds that the Publishers exaggerate the potential harm of Caption, which Audible claims is too limited in both form and function to be perceived by users as a viable substitute for any kind of book-reading experience. Audible also asserts that, at most, the Publishers have a breach-of-contract claim that does not implicate copyright law. But just in case the court disagrees with that argument, Audible asserts that Caption is a “quintessential fair use,” a claim that rests primarily on the implication that Caption is “transformative” in its ability to help reverse downward trends in American reading.

Breach-of-Contract Defense Misrepresents Copyright Law

“Each Plaintiff granted Audible a license to its copyrighted works, and yet now alleges that Audible Captions infringes those licensed works. But the law is clear: by agreeing to those licenses, Plaintiffs waived their right to sue for copyright infringement as a result of licensed conduct. Thus, this Court need not reach the copyright issues presented here.“

Notice how words to the effect of “to distribute plaintiff’s sound recordings” are missing from that first sentence? Audible is probably not being careless in this statement so much as it is being a bit too clever by half—using language that is too broad to accurately describe the nature of its agreement with the Publishers. As stated, Audible licensed the right to distribute sound recordings belonging to the Publishers and nothing more. Consequently, its claim that the Publishers’ only remedy is to be found in contract law hinges on a misreading of copyright practice.

Copyright is not a single right, but a “bundle of rights,” which the author/owner may exploit or not as she chooses under a variety of license agreements. For instance, the author may choose to license the translation of her novel to a specific publisher she trusts; or she may separately refuse to allow sequels to a story she feels should not be serialized. These are two distinct examples of licensing options, both protected by the same statutory right to “prepare derivative works.”

In Audible’s claim, it seems that by omission and obfuscation, they hope to convince the court, at this preliminary stage, that their license to distribute sound recordings extends to a right to transcribe those recordings into captions simply because the contracts do not specifically prohibit this conduct. This unusual claim reads to me like a strategy to get the court to deny the Publishers’ request for a preliminary injunction, which the court would certainly do, if it agreed that the complaint is limited to a contract dispute. This would then allow Audible to enjoy the PR benefits of launching and promoting Caption while, presumably, negotiating with the Publishers in the matter. But it is frankly hard to imagine how the court will find this argument tenable, let alone persuasive.

Defendant asks the court to reject out of hand the plaintiff’s assertion that the Caption feature constitutes unlicensed reproduction, display, and distribution of a book’s text—three rights enumerated in the copyright statute. So, unless the court can find a rationale that Caption does not cause reproduction, display, and distribution of these works, it seems unlikely it will concur with Audible’s view that their conduct does not implicate a copyright complaint that warrants further proceeding.

While it is possible to breach a license agreement in a manner that does not result in copyright infringement, such an interpretation in Audible would seem anathema to the way licensing usually works. When a contract is written to grant a limited license, the copyright owner does not need to add a clause itemizing all other possible uses of the underlying work as being specifically prohibited. More typically, the contract will clearly describe what is being granted followed by a concluding statement to the effect that “all other rights are reserved.”

On that subject, the Caption feature demonstrates the fact that technological innovations can yield potential uses of copyrighted works that will not be anticipated at the time a contract is executed. Despite this, the author does not abandon his right a priori to license a potential use that has not yet been invented or introduced to the market; and his rights cannot be abrogated wholesale in the name of “innovation.”

This is one reason authors should hope the court proceeds with tremendous caution in this case—if not in response to what Caption appears to be at present, then with an awareness of what Audible/Amazon could have in store in the near future. With that in mind, it is worth examining the underpinning of Audible’s fair use defense—namely that Caption can be a valuable tool for “struggling readers.”

Is Caption Fair Use?

Contrary to the “not copyright” defense, the court could find Audible’s fair use claim somewhat more persuasive insofar as Caption does appear to share certain qualities with Google Books—at least in its present form. The fair use claim rests principally on the grounds that Caption is “transformative” (under the first factor analysis) as an educational enhancement to audiobook listening; and that it is not a market substitute (under the fourth factor analysis) for either electronic or printed books.

Kevin Madigan at CPIP writes that Caption is not at all transformative because there is nothing particularly innovative about “turning” a book into readable text. “Audible is reproducing the text of a literary work for the purpose of reading—whether for education or for entertainment—and that is the exact purpose of the underlying works of authorship,” he writes. This point is beyond dispute.

Nevertheless, the court may be somewhat persuaded by a comparison to Google Books, which was held to be both transformative and non-substitutive in a finding this same court called “pushing the boundaries of fair use.” There are reasons to find that Caption crosses those boundaries.

Fair Use Factor One – Can Caption “Transform” Reading?

By alluding in its brief to broad trends in American reading habits, Audible seems to imply that Caption is an antidote to some rather dismaying data. For instance, the brief notes, “36% of 8^thgraders are reading at a ‘proficient’ or ‘advanced’ level while 24% are below ‘basic’ level …” Further, Audible observes, “One third of teens reported not reading any books for pleasure in 2016; yet they reported spending on average four to six hours per day online, texting, and on social media.”

These statistics are sobering to be sure; and as the parent of a high-schooler and middle-schooler trying to encourage his kids to enjoy reading despite all those electronic distractions, I can relate. But with that said, it is hardly conclusive that more technological gadgets are a solution to the problem—a problem that, according to Audible’s own citation, is partly fostered by the omnipresence of tech toys in the first place. So, it is conceivable that Audible is overstating Caption’s general value in order to seem a bit more “transformative” than it is.

It is certainly plausible that readers who struggle—either because of physical barriers, cultural-economic barriers, or plain bad habits—could achieve reading comprehension benefits from using Caption. But this possibility, for which there is not enough data, does not inherently support Audible’s “transformative” argument as a rationale to make nearly every book in its library available in Caption form to every customer worldwide. That is a lot of market to cede to one company without license.

Morevoer, Audible’s implication that Caption might reverse reading trends at scale actually supports the Publishers’ position that the feature is not a “transformative” use so much as it is potentially a new way of reading. If this became true, it would only underscore the fact that authors and publishers have a vested interest in that future; and at the same time, Audible’s implication that it might bring reading back actually undermines its non-substitutive claim under the fourth prong of the fair use analysis.

Fair Use Factor Four — Caption Is Not a Substitute?

Unlike Google Books, Caption makes the full text of a book available, so the court should be wary and cognizant of the likelihood that, with minor technological improvements and/or shifts in market dynamics, Caption could conceivably become an unlicensed market substitute for eBooks. So, authors should be very concerned about a fair use precedent in this case—if not for Caption in its nascent form—then for the next iteration of a Caption-like service that could become the new reading for many consumers.

Again, we ignore at our peril that Audible is a subsidiary of Amazon; and it is not the least bit unfair to imagine how a seemingly innocuous feature like Caption can be a springboard for expanding Amazon’s already outsized influence in publishing and elsewhere. If the court finds that Caption is fair use today, and Audible actually did grow the reading market—as it implies that it can—we begin to see very familiar territory as yet another tech giant positions itself as a monopsony. Does anyone really believe that Amazon would not become to book writers what Spotify is to songwriters? Really?

In light of Big Tech’s track record so far, this is hardly an alarmist point of view, and anyone who actually cares about writing or reading books can be forgiven a healthy dollop of skepticism about the professed good deeds of any of these companies. As the New York Times recently reported, Amazon sells foreign-made books that are so poorly produced that they do not even contain accurate reproductions of the text. Citing George Orwell’s works, David Stretfield notes that the books he acquired include “… straightforward counterfeits, like the edition of his memoir ‘Down and Out in Paris and London’ that was edited for high school students. The author’s estate said it did not give permission for the book, printed by Amazon’s self-publishing subsidiary.” So forgive me a raised eyebrow when a subsidiary of this company says it wants to save literature.

While it is certainly not in the authors’ or publishers’ interests to prevent changes in the way people might read in years to come—if indeed changes are on the horizon—these parties must remain the primary stakeholders in that future. Consequently, if and when the court considers the fourth fair use factor in this case, authors, publishers, and readers should hope that it underlines the statutory mandate to consider potential market harm. Because there is little evidence to-date that Amazon will not exploit any opportunity in its efforts to become the worldwide distributor of everything.

* Chronicle, Hachette, HarperCollins, MacMillan, Penguin Random House, Scholastic, Simon & Schuster.

The Illusion of More

Dissecting the digital utopia.

Tag: audiobooks