AI Machine Learning: Remedies Other Than Copyright Law?

In my last post, I discussed some of the allegations that “machine learning” (ML) with the use of copyrighted works constitutes mass infringement. Citing the class action lawsuits Andersen and Tremblay, I predicted that if the courts do not find that ML unavoidably violates the reproduction right (§106(1)), copyright law may not offer much relief to the creators of the works used for AI development. As of last week, it remains to be seen whether we’ll get to that question after Judge Orrick of the Northern District of California stated that he is tentatively prepared to dismiss the suit with leave to amend the complaint. The judge did indicate that a claim of direct infringement could survive, but we’ll have to see what comes of an amended complaint.

As mentioned in the last post, if the court does not find a valid claim of copyright infringement, the other allegations will likely fail as a result. Nevertheless, though the state allegations may be moot in the class cases filed thus far, I had intended in this post to look at whether any non-copyright remedies present much hope for creators. For instance, the Andersen complaint alleges violations of statutory and common law rights of publicity and violations of statutory unfair practice prohibitions in the State of California.

Right of Publicity and Works “in the style of…”

One of the most concerning aspects of generative AI is that it allows a user to prompt the system to make a work “in the style of [named artist].” Karla Ortiz, in her testimony to the Senate Judiciary Committee on July 12, stated, “[Artist] Greg Rutkowski, had his name used as a prompt between Midjourney, Stability AI and the porn generator Unstable Diffusion, about 400,000 times as of December 2022. (And these are on the lower side of estimates).” This is a deeply personal assault on the identity, work, and potential livelihood of an artist who has spent years mastering his craft and developing that distinctive style which can now be mimicked by a computer. But as a matter of doctrine, copyright does not protect style. So, can state laws like the right of publicity (ROP) offer any relief?

Only half the states in the U.S. have statutory rights of publicity, though most states recognize a common law ROP, which may prove more expansive in a litigation. The California statute, considered one of the strongest, prohibits the use, without consent, of a person’s likeness, name, voice, or signature for commercial purposes. In particular, advertising a product, service, or viewpoint so that a representation of the individual implies that person’s endorsement, is a paradigmatic violation of ROP and may infringe the speech right of the individual. But does prompting a generative AI to create an image “in the style of [artist]” implicate the artist’s ROP? Maybe and sometimes.

Because a user can prompt the production of a visual work “in the style of Greg Rutkowski,” one obvious implication is that there will be hundreds or thousands of “Rutkowskis” in the world which the artist did not create. If any of those AI-generated works are substantially similar to an existing work he did create, then he may have claims of copyright infringement—and potentially too many to contemplate addressing. But what about the works that do not look substantially similar to anything in the artist’s portfolio but to the observer, do look like “new Rutkowskis”?

In the fine-art trade, the duty to validate provenance and authenticity (i.e., not commit forgery) should offer some protection for those artists who sell their work in galleries etc. But in the commercial market, if a new vodka brand wants images in the style of, say, Molly Crabapple for its ad campaign but doesn’t want to hire Molly for the job, they could use generative AI to make something in her style. No question this implies that artists will lose gigs, but whether ROP offers a remedy to Crabapple herself in this hypothetical is questionable.

To begin, artwork in the style of the artist is not a “likeness.” Thus, Crabapple would have to prove that observers seeing the ads would perceive the images as her work and that the images, therefore, result in an unlicensed endorsement in violation of ROP. That can be a high bar to reach, let alone repeat in what could amount to multiple complaints by just one artist—and potentially in multiple states! Although a violation of ROP can stand alone without an underlying violation of some other law, it is a fact-intensive, case-by-case consideration and, therefore, hard to imagine how it can support allegations of harm to a plaintiff class as alleged in Andersen.

Moreover, some states only recognize celebrity ROP and not average citizen ROP. So, in the case of the visual artist, what is the threshold where he or she is famous enough to be considered a celebrity? Crabapple, Rutkowski, et al. are very well known in the art world, but they’re not movie-star well known to the general public. So, what constitutes “celebrity” in this context? We don’t know. The specific problems caused by generative AI are brand new.

New Federal ROP Law?

In testimony before the SJC along with Karla Ortiz, General Counsel for Universal Music Group Jeffrey Harleston broached the subject of adopting a federal right of publicity, and the idea was at least entertained by some of the senators. A federal ROP could theoretically address some of the new harms to artists caused by generative AI. Not only would a federal statute provide a uniform, national framework, but the new law would be written with an understanding of AI and its potential harms as a foundation of legislative intent. Further, it was raised in committee that ROP should apply to everybody and not just celebrities.

The Motion Picture Association (MPA) filed comments in response to this discussion, and these were focused largely on the fact that, historically, ROP has applied to commercial/promotional uses, but not to expressive ones. The MPA is right to point to some tricky considerations that would need to shape a new federal ROP in order to strike a balance between disenfranchising creators or performers through AI-generated replicas while allowing use of the technology to create expressions that are protected by the First Amendment.

In fact, in my book, I allude to a hypothetical future biopic about Carrie Fisher that (with the family’s permission, of course) might dramatize scenes using AI replicas. Whether this use of the technology would be an engaging choice in lieu of casting an actress play young Carrie is a question of aesthetics and culture, but not a question that can or should be addressed as a matter of law. Suffice to say, the contours of a prospective new federal ROP are complex enough to be subject of future posts.

Unfair Competition

Unlike an ROP complaint, unfair competition does not stand alone as an allegation. In general, these laws bar businesses from gaining unfair advantage by engaging in some form of prohibited conduct. In the Andersen et al. class action suits, the underlying conduct is alleged to be copyright infringement, which allegedly makes the AI developers unfair competitors with the plaintiff class of artists. This state allegation would seem to have merit if the court finds the developers liable for violation of §106(1) of the Copyright Act, and unlike ROP, I can see it surviving as a complaint for a whole class—i.e., as unfair competition against all artists. That would be encouraging. But if, for instance, the court finds that not all the named plaintiffs have standing (i.e., do not have registered works in suit), it’s hard to say what this does to the unfair competition complaint as argued.

What About Trademark?

It is tempting to wonder whether certain creators can find protection in trademark law. In addition to the cost of registering and maintaining a trademark, only certain artists would be able to make effective use of this form of intellectual property. In this instance, trademark only protects use of the artist’s name in commerce, and since it is already illegal to trade in forgeries, registering one’s name as a trademark may be redundant and little protection against the use of AI to produce “in the style of” works.

Relatedly, the Federal Trade Commission (FTC) may have a role to play to protect consumers against fraud stemming from the uses of generative AI. As to the FTC stepping in, presumably they could respond to or seek prophylactically to protect consumers from forgery at scale. Not unlike the rampant proliferation of counterfeit products sold via eCommerce sites, generative AI certainly presents the opportunity for some party to start generating mass forgeries of popular artists and selling those in the millions. As such, measures to restrict the use of artists’ names in generative AI may belong in the FTC’s wheelhouse.

In both this post and the last, my intent is not to advocate on behalf of the AI developers. Far from it. Instead, I am trying to kick the tires of existing law to ask whether the law is sufficient to the task of protecting authors of creative works. Because, overall, I’m not sure it is, though it is also essential to note that every type of work has different implications (i.e., voice actors’ rights are more likely to sound in ROP than visual artists’ rights).

One thing is certain. Generative AI is not comparable to the printing press, camera, phonograph, or any more recent changes to production and distribution enabled by digital technology. Debate about AI must be sequestered from discussions about technologies of the past because few, if any, of those revolutions are instructive to the moment. There is no doubt that AI implies new regulation in medicine, finance, IT, security, and just about everywhere else it will invade; and there is no reason why Congress cannot adopt the same posture in order to protect America’s creative culture and economy.

Image source by: idaakerblom

Training AI With Protected Works: Is Copyright Law Designed to Respond?

Many creators feel very strongly that “training” AI models with unlicensed, copyrighted works is unjust—not least because generative AIs built on their creativities will put some creators out of business while enriching more tech moguls. It is both insult and injury to see one’s work used, without consideration, to underwrite the mechanism of one’s own obsolescence. But regardless of how we may feel about the practice of “machine learning” (ML) with unlicensed material, it remains to be seen whether and where current law provides any remedies. I’ll try to consider that topic in this post and the next post, beginning with the allegation that ML is mass copyright infringement.

Four class action lawsuits against generative AI developers have been filed thus far in the District Court for the Northern District of California, and all by the same law firm. Because all the complaints are similar, I will stick to the two that were filed first. In Andersen et al. v. Stability AI et al., a class of visual artists is suing Stability AI and Midjourney;[1] and in Tremblay et al. v. Open AI, a class of book authors suing OpenAI over the development of ChatGPT.[2] Both complaints allege direct and vicarious copyright infringement as well as unlawful removal of copyright management information (CMI). Both complaints also contain counts for violation of the derivative works right §106(2), and based on that theory, the Andersen complaint alleges unlawful making available of said derivative works in violation of 106(3), (4), & (5). The complaints also contain state law allegations, but I will discuss those in the next post.

Reproduction and the Battle of Analogies

The question of whether ML with copyrighted works constitutes an act of mass infringement will turn on the factual consideration as to whether any copying occurs in violation of the reproduction right (§106(1)). In Andersen and Tremblay, there is considerable focus on the potential of a generative AI to output an infringing work based on its training corpus. For instance, if the work of Karla Ortiz (one of the named plaintiffs in Andersen) is part of the ingested materials, then the assumption is that the AI model has the potential to produce a copy of an existing Ortiz work or a work that is substantially similar to an Ortiz work.

The reproduction inquiry may be different for each model and each type of work used for input. In Andersen, the complaint states, “Because a trained diffusion model can produce a copy of any of its Training Images—which could number in the billions—the diffusion model can be considered an alternative way of storing a copy of those images.” By contrast, the Tremblay complaint alleges that copying occurs, but it does not specifically describe how the ChatGPT training process entails reproduction. “During training, the large language model copies each piece of text in the training dataset and extracts expressive information from it,” the complaint states.

If the AI system produces any copies of any of its training materials, this is evidence that the system violates the reproduction right. Prompt the generator to make an image of Dr. Strange, and if Dr. Strange comes out, then nobody can doubt that Dr. Strange is a latent copy in the system and that this potential to copy is sufficient evidence of infringement at the input stage. Alternatively, if the system can only produce work “in the style of” Karla Ortiz, this raises different issues (and very serious concerns) but may not be considered sufficient evidence of “reproduction” in the input process. But the courts need not look at outputs, or even potential outputs, to find violation of the reproduction right.

It has been held (specifically in the 9^th Circuit)[3] that even storing a copy in random access memory (RAM) is sufficient to find a violation of the reproduction right. The AI developers will seek to prove that their systems do not copy the works ingested in any sense, or that if they do, they copy only non-protected (i.e., factual) elements of the works. Using anthropomorphic words like observe, learn, study, etc. to describe ML, the argument from the developers will be that these models are designed to obtain information about the works but not copy the works anywhere in the system. Input an illustration, for example, and what the system allegedly stores are millions of data points about line weights, composition, colors, shading, etc. Then, combined with billions of other data points from billions of other works, the model generates probability algorithms which are then used to produce new visual works when users prompt the system with instructions.

AI developers like to compare “training” their models to the learning a human artist does when she experiences or studies works other than her own. In addition to being a reductive and dehumanizing analogy for the ways in which artists teach themselves a craft, this line of reasoning may be seen by the courts as smoke and mirrors. The factual question is whether the system retains a copy long enough to be perceived by the machine, which has been held to be violative of §106(1). Long-term storage of a copy is not required, and my understanding is that making a “more than fleeting” copy is unavoidable in any computer system—i.e., that there is no such thing as ingestion without reproduction.

Proving reproduction will be the whole ballgame insofar as litigation can address whether feeding a corpus of protected works is a violation of law. We shall see what the courts make of the facts presented, but without finding reproduction, the other copyright complaints likely fall. For instance, removal of CMI is not a stand-alone violation. Section 1202 of the DMCA states that removal is a violation if the party doing the removing knows or has reasonable grounds to know “that it will induce, enable, facilitate, or conceal an infringement of any right under this title.” Therefore, there must be a colorable claim of infringement for the CMI allegation to survive.

Derivative Works Allegations

Both the Andersen and Tremblay complaints allege that the AIs produce unlicensed derivative works in violation of §106(2), though the arguments are different in each case. In Andersen, the allegation arises from the premise that the system cannot produce anything outside the limitations of its data set composed of protected works. “The resulting image [output] is necessarily a derivative work, because it is generated exclusively from a combination of the conditioning data and the latent images, all of which are copies of copyrighted images.…a latent diffusion system…can never exceed the limitations of its Training Images.”

It’s an interesting theory, but I’m not sure anything in copyright law can support the argument that all potential outputs of the generative AI are unauthorized derivatives of the total corpus of works in the training set. To find an infringing derivative of a visual work (typically one image) requires a substantial similarity inquiry comparing a specific original with the follow-on work to determine what has been copied and whether that copying renders the second work a derivative of the first. This is difficult enough in the world of humans intentionally using a single visual work to produce a different visual work (see Goldsmith v. Warhol!!). So, it seems highly speculative to ask a court to find generally that billions of images output are, as a matter of law, derivatives of the billions of images input. I’m not certain the court has anywhere to look for guidance to consider this reading of the derivative works right.

If this derivative works theory is tough with images, it would be even harder with text—i.e., to allege that the textual outputs are derivatives of all the textual inputs is akin to saying that every book written is a derivative of every book read. This echoes a popular sentiment among the anti-copyright crowd that no work is “original,” a premise that should not be given any legal weight, even in the service of trying to protect creators from AI developers.

In Tremblay, the allegation is not that the individual outputs of ChatGPT are derivatives of the corpus of books used in training, but that the entire model is a single derivative work of its corpus. “Because the OpenAI Language Models cannot function without the expressive information extracted from Plaintiffs’ works (and others) and retained inside them, the OpenAI Language Models are themselves infringing derivative works, made without Plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act,” the complaint states. [Emphasis added]

Again, claiming that the entire LLM is a single derivative work of the millions of literary works fed into the system would seem to strain the derivative works right beyond the limit where any court can venture. In fact, this allegation could potentially bolster the inevitable fair use defense the AI developers will be arguing—namely that the finding of “transformative use” in Google Books favors fair use of the corpus of work used in ML.

Fair Use & Google Books

Notably, these cases are brought in California, controlled by the Ninth Circuit and, therefore, not bound by the Second Circuit decision in Google Books, which many believe to be the strongest precedent favoring fair use for the AI developers. The comparison is a natural one. Google scanned whole books into a system to create a unique tool for searching the contents of books without providing any whole-copy substitutes for legally obtained copies. The court, noting that its decision “pushed the boundaries of fair use,” found under factor one that Google Books is “transformative” for its utility and found under factor four that it did not pose a threat to the market for the books used.

What the AI developers will try to argue under Google Books is that 1) their systems are highly “transformative” because they use protected works to create novel (even revolutionary) applications; and 2) their systems are designed to avoid outputting any copies that would serve as substitutes for the works in the data set. It is conceivable that courts or juries would find the comparison compelling, though the aforementioned capacity of a given AI to output Dr. Strange means that, unlike Google Books, the visual AI system at issue does make substitutes available and, therefore, the precedent is inapt.

By contrast, ChatGPT or other text-based application could have a stronger defense under Google Books if it is not possible, for instance, to have the system output an entire in-copyright literary work. The Tremblay complaint refers to the output of summaries, which is evidence that a whole book was ingested, but a summary is not generally an infringement and is certainly not a substitutional copy.

Meanwhile, other considerations should perhaps militate against finding fair use for generative AI model training. For instance, Google Books is a research tool for humans to learn about books written by other humans, including humans who write more books. Generative AIs are not necessarily comparable. For instance, Stable Diffusion does not provide a user with any information about an ingested work, and it poses an unprecedented threat to professional visual artists unlike any technology that has come before. Thus, the courts should consider the sui generis purpose of the generative AI at issue when citing Google Books or any other precedent to consider fair use.

In a May post, I proposed that unless the generative AI at issue can show that it promotes authorship, the court should decline to consider a fair use defense. To clarify, in Campbell, the Supreme Court states, “The fair use doctrine thus ‘permits [and requires] courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the very creativity which that law is designed to foster.”[4] Until generative AI changed the landscape, there was no need to affirm that “the very creativity” fostered by copyright means “human creativity.” But today, that distinction is necessary. Although generative AI can produce volumes of “creative” material, only those works which can be protected by copyright are works of authorship. And just like it is indecent to exploit an artist’s work to build a machine that might end her career, it would be absurd to allow fair use (a component of copyright law) to defend a technology that would potentially annihilate copyright’s purpose.

Of course, that’s one man’s opinion, and one that would apply to some, but not all, works derived by generative AI. As these tools develop, and their uses are explored by various types of creators, there are examples, both in practice and in theory, where we can find that generative AI does foster new authorship. This gets into the complicated question of copyrightability of works that humans create with some AI used in the process, and because this is itself a new discussion, it is difficult to say which generative AIs, if any, can be said to “promote the progress” of authorship as a matter of law.

Legal experts, both pro and anti-copyright, will comment upon the strengths and weaknesses of Andersen, Tremblay et al. represented by the one firm that has taken the lead on these lawsuits. But even where these cases may be flawed, they can provide some insight into the question posed by this essay: is copyright law an answer to the potential hazards of generative AI? I suspect that a fundamental difficulty arises because generative AI poses an existential threat to the future of authors, and some of the injustices and cultural calamities inherent to that threat may not be remedied (or entirely remedied) by the principles of copyright. Remedies sounding in other areas of law could loom larger, especially for certain types of creators, and that will be the subject of the next post.

[1] Deviant Art is also a named defendant being sued for breach of contract for providing works to Stability for ingestion.

[2] The same firm is now representing Sarah Silverman and another class of book authors, though the complaint is essentially the same as Tremblay.

[3] MAI Systems Corp. v. Peak Computer, Inc., 991 F.2d 511 (9th Cir. 1993).

[4] Citing Stewart v. Abend (1990).

Image by: idaakerblom

AI, Search, & Section 230

On May 18, the Supreme Court delivered opinions in Gonzalez v. Google and Twitter v. Taamneh, a pair of interrelated cases in which both plaintiffs sought to hold online platforms liable for hosting material meant to inspire acts of terrorism. Because the Court unanimously found in Taamneh that there was no basis in anti-terrorism law for liability (and therefore no claim for relief), it then declined to address the Section 230 question in Gonzalez, which was whether Google’s “recommendation algorithm” is sufficient to find contributory liability for the inciteful material being recommended.

Properly read, Section 230 shields OSPs from “publisher liability” but not from “distributor liability.” A distributor of allegedly harmful material may be liable when it knows, or has reason to know, the nature of the material and either affirmatively chooses to distribute it or willfully turns a blind eye to the potential harm and does nothing to stop it. Unfortunately, ever since 230 became law in 1996, the courts have generally read the law as a blanket shield for any OSP distributing any kind of material as long as it was uploaded by a user of the site and not by the site operators.

Plaintiff Gonzalez alleged that Google’s “recommendation” algorithm, designed to promote content based on the system’s interpretations of user behavior, played a crucial role in pushing ISIS propaganda toward the parties who eventually committed a mass shooting in Paris that resulted in the death of Nohemi Gonzalez. Plaintiffs argued that “targeted recommendations” are not properly shielded by Section 230, and to the extent one can read the tea leaves in oral arguments, justices as opposite as Thomas and Brown-Jackson may be sympathetic to this view.

For further reading in “Strange Bedfellows,” the amicus brief in Gonzalez filed by Senator Hawley echoes many of the same legal arguments in the brief filed by Cyber Civil Rights Initiative. Also, Senators Hawley and Blumenthal are at least publicly in synch on the need to correct the errors in Section 230. “Reform is coming,” Sen. Blumenthal declared in March. All of which is to say that there appears to be both bipartisan and multi-stakeholder consensus building around the idea that platforms can and should be held accountable for promoting harmful material.

Does AI-Enhanced Search Imply Liability?

Notably, one prong of Google’s defense in Gonzalez was that “recommendation” is analogous to search and that delivering search results cannot rise to the level of contributory liability. Whether the Court would agree with this comparison under full examination in a viable case remains an open question. But assuming the Court would not have sided with Google, what might it make of Google’s new Search Generative Experience (SGE)? Still in trial phase for users who choose to enable it, the AI-driven SGE could be the new mode of search, or (if it totally sucks) could tank Google’s core business. As James Vincent writes for The Verge:

… it’s the dynamics of AI — producing cheap content based on others’ work — that is underwriting this change, and if Google goes ahead with its current AI search experience, the effects would be difficult to predict. Potentially, it would damage whole swathes of the web that most of us find useful — from product reviews to recipe blogs, hobbyist homepages, news outlets, and wikis. Sites could protect themselves by locking down entry and charging for access, but this would also be a huge reordering of the web’s economy. In the end, Google might kill the ecosystem that created its value, or change it so irrevocably that its own existence is threatened.

Hard to predict for sure, and I will not make the attempt. There are, of course, many potential hazards with AI-enhanced search, not the least being more virulent mutations of garbage results (as if misinformation needs any help). But in a Section 230 context, would the deployment of SGE as Google’s new search model increase the likelihood of its liability under the same legal arguments presented in Gonzalez? The “recommendation” algorithm is a form of AI, and if that level of platform influence could be sufficient to find liability, then presumably a more robust use of AI could result in a stronger allegation of liability.

On June 14, Senators Hawley and Blumenthal introduced a two-page bill that would make Section 230 immunity unavailable for service providers “if the conduct underlying the claim or charge involves the use or provision of generative artificial intelligence by the interactive computer service.’’ Presumably, this bill can be seen as performative along with other announcements from Congress that AI has their attention, with various Members promising not to be fooled again into allowing Big Tech to regulate itself. There’s a lot of “We’re on it” messaging coming from the Hill about AI, and we’ll see what comes.

In the meantime, perhaps there is something to the Hawley bill in light of the considerations in Gonzalez and the imminent release of SGE. At first, I sneered at the amendment because generative AI is primarily a tool of production, and Section 230 immunity has little or nothing to do with production. It doesn’t matter whether the harmful material at issue is produced with Midjourney or a box of crayons. But if a generative AI serves as the engine for a new mode of search (i.e., recommendation), then the language in the Hawley/Blumenthal amendment would seem to obviate the need to litigate the question presented in Gonzalez. Congress would be declaring that Google is not automatically shielded from liability.

Considering that we are far from resolving the damage done by the “democratization of information,” it’s tough to feel sanguine about the prospect of AI making search better rather than suck faster. On the other hand, if the adoption of AI in certain core functions of online platforms is a basis for Congress resetting the terms of liability, then perhaps service providers will discover a renewed interest in the original intent of Section 230—an incentive to remove harmful material, not to keep it online and monetize it.

Photo source by: sinenkiy

The Illusion of More

Dissecting the digital utopia.

Category: Artificial Intelligence (AI)