Pass the TikTok Legislation. And then…

“At what point then is the approach of danger to be expected? I answer, if it ever reach us, it must spring up amongst us. It cannot come from abroad. If destruction be our lot, we must ourselves be its author and finisher. As a nation of freemen, we must live through all time, or die by suicide.” – Abraham Lincoln, The Lyceum Address, 1838 –

Lincoln’s famous observation that only Americans can truly destroy America speaks to the fragility of the Republic, which the founders knew could only endure so long as the people generally keep faith with certain core principles. Watching those principles assaulted by a far-right populism, which has presently swallowed the Republican Party, it is natural to read Lincoln as prophetic, and it is hard to imagine any foreign influence being more dangerous. On the other hand, when Lincoln said, “It cannot come from abroad,” he could hardly have imagined a time when 170 million young Americans would carry a pocket surveillance device loaded with software under the control of a foreign adversary.

Following the 362-55 vote by the House to force TikTok to divest itself of all ties to the Chinese Communist Party (CCP), opinions about the bill question both its necessity and viability—though not with good reason. Although rashly described as a “ban,” the effect of H.R. 7521 would force a sale of the platform by parent company ByteDance to an owner without ties to the CCP. To that end, I agree with independent musician Blake Morgan. who endorses the TikTok legislation, both as a national security and anti-piracy measure. In an editorial for IP Watchdog, Morgan writes:

The vast majority of music on TikTok generates virtually no revenue for the musicians who made it, and even more music on the platform is completely unlicensed (stolen), copied (stolen via AI), or pirated (stolen). Simply put, TikTok is trying to build a music-based business without paying music makers fair value for the music. That’s why Universal Music Group has already pulled out of TikTok. That’s why the National Music Publishers’ Association has already announced it won’t renew its license with the company. So, TikTok poses “a clear and present danger” to American music, too.

The music piracy alone is reason to force the platform to operate within the reach of U.S. law, but with regard to the national security threat, it is notable that unless one is in the intelligence community, or a Member of Congress receiving a security briefing, we are left to rely upon one of those core principles, which have been eroded by social media in general: trust. I do not endorse the Whatabouist’s view that just because TikTok is not alone in causing havoc that this legislation is moot, but the story does highlight those hazards of social media that make it difficult to convince many Americans that TikTok is a threat of any kind.

Joseph V. Amodio, writing for Tanium, states that TikTok is distinguishable from other platforms thus:

TikTok stands out in its power to manipulate: While videos from any app can go viral, TikTok’s infection ability is unique, given the practice of “heating,” where TikTok staff can supercharge distribution of hand-picked videos. This has huge implications for fair competition and free trade. Just imagine how they can siphon profits by amplifying your competitors’ posts or cooling down your own viral campaigns.

Whether the goal of data manipulation is to pull the levers on enterprise, as Amodio indicates, or to influence young voters on policy matters, how does one convince nearly 200 million 18 – 29-year-olds that said manipulation is both occurring and should be seen as an attack? If an act of cyberwarfare entails hacking the Pentagon or shutting down part of the power grid, enough Americans can probably recognize such events as attacks in a traditional sense. Likewise, the prospect of malicious software injected into millions of mobile devices might be understood as a threat.

But what if the weapon is an insidious propaganda tool used to manipulate the opinions of millions of citizens? Who is going to be trusted to identify that as a sustained attack on the United States? Some portion of the TikTok demographic will not believe that China (or Russia) is an adversary in the first place, which is arguably evidence itself of social media’s power to influence.

Even if the delivery platform is owned by Meta serving “ads” purchased by foreign operatives with the same objective to sow discord, no individual wants to believe he’s being manipulated. More complexly, even if one tries to apply critical thinking, the effort itself is often countered by teams of data manipulators flooding the zone—i.e. the illusion of more “information” tilting bias in one direction or another. This was true before parties like China and Russia upped their cyber game and before they could add artificial intelligence to the toolset.

As a practical example at the heart of the TikTok story, how does the moderate, who would rather not hyper-politicize national security, take the contemporary Republican seriously in his professed opposition to TikTok’s capacity to “manipulate” Americans? For instance, Rep. Ralph Norman of South Carolina writes, “…if you’ve spent 5 minutes exploring TikTok, you should have recognized the addictive nature of this platform. It is designed for one purpose: to control your attention. Their algorithm quickly figures out what kind of videos you’re likely to watch, and then feed you similar videos to keep you fixated.”

Fine. But one could swap “TikTok for “Trump” and make the same general argument, including that his self-interested rhetoric about NATO, disrespect for the Constitution, etc. all comprise a threat to national security. What would Lincoln say to his legacy party about this tangled interplay between foreign and domestic forces, both hostile to American interests, and both weaponizing disinformation through addictive and manipulative platforms?

In this context, it is important to note that Trumpism is a symptom of populism—a trend that is no less prevalent on the left than on the right, perhaps especially among 18 – 29-year-olds. The difference, for the moment, is that the left has not found its own cult-like figure, who might also undermine core principles, albeit in a different style than Trump. The rise in populism in the U.S. and other democracies is a direct result of social media’s nature to factionalize hearts and minds, which is precisely what a foreign adversary wants to achieve. TikTok may be a shrewdly named time-bomb delivered to over half the U.S. population and, as such, should be diffused. But assuming that task can be accomplished, the existential question remains as to whether we can quarantine the most virulent effects of all social platforms or “die by suicide.”

Training AI With Protected Works: Is Copyright Law Designed to Respond?

Many creators feel very strongly that “training” AI models with unlicensed, copyrighted works is unjust—not least because generative AIs built on their creativities will put some creators out of business while enriching more tech moguls. It is both insult and injury to see one’s work used, without consideration, to underwrite the mechanism of one’s own obsolescence. But regardless of how we may feel about the practice of “machine learning” (ML) with unlicensed material, it remains to be seen whether and where current law provides any remedies. I’ll try to consider that topic in this post and the next post, beginning with the allegation that ML is mass copyright infringement.

Four class action lawsuits against generative AI developers have been filed thus far in the District Court for the Northern District of California, and all by the same law firm. Because all the complaints are similar, I will stick to the two that were filed first. In Andersen et al. v. Stability AI et al., a class of visual artists is suing Stability AI and Midjourney;[1] and in Tremblay et al. v. Open AI, a class of book authors suing OpenAI over the development of ChatGPT.[2] Both complaints allege direct and vicarious copyright infringement as well as unlawful removal of copyright management information (CMI). Both complaints also contain counts for violation of the derivative works right §106(2), and based on that theory, the Andersen complaint alleges unlawful making available of said derivative works in violation of 106(3), (4), & (5). The complaints also contain state law allegations, but I will discuss those in the next post.

Reproduction and the Battle of Analogies

The question of whether ML with copyrighted works constitutes an act of mass infringement will turn on the factual consideration as to whether any copying occurs in violation of the reproduction right (§106(1)). In Andersen and Tremblay, there is considerable focus on the potential of a generative AI to output an infringing work based on its training corpus. For instance, if the work of Karla Ortiz (one of the named plaintiffs in Andersen) is part of the ingested materials, then the assumption is that the AI model has the potential to produce a copy of an existing Ortiz work or a work that is substantially similar to an Ortiz work.

The reproduction inquiry may be different for each model and each type of work used for input. In Andersen, the complaint states, “Because a trained diffusion model can produce a copy of any of its Training Images—which could number in the billions—the diffusion model can be considered an alternative way of storing a copy of those images.” By contrast, the Tremblay complaint alleges that copying occurs, but it does not specifically describe how the ChatGPT training process entails reproduction. “During training, the large language model copies each piece of text in the training dataset and extracts expressive information from it,” the complaint states.

If the AI system produces any copies of any of its training materials, this is evidence that the system violates the reproduction right. Prompt the generator to make an image of Dr. Strange, and if Dr. Strange comes out, then nobody can doubt that Dr. Strange is a latent copy in the system and that this potential to copy is sufficient evidence of infringement at the input stage. Alternatively, if the system can only produce work “in the style of” Karla Ortiz, this raises different issues (and very serious concerns) but may not be considered sufficient evidence of “reproduction” in the input process. But the courts need not look at outputs, or even potential outputs, to find violation of the reproduction right.

It has been held (specifically in the 9^th Circuit)[3] that even storing a copy in random access memory (RAM) is sufficient to find a violation of the reproduction right. The AI developers will seek to prove that their systems do not copy the works ingested in any sense, or that if they do, they copy only non-protected (i.e., factual) elements of the works. Using anthropomorphic words like observe, learn, study, etc. to describe ML, the argument from the developers will be that these models are designed to obtain information about the works but not copy the works anywhere in the system. Input an illustration, for example, and what the system allegedly stores are millions of data points about line weights, composition, colors, shading, etc. Then, combined with billions of other data points from billions of other works, the model generates probability algorithms which are then used to produce new visual works when users prompt the system with instructions.

AI developers like to compare “training” their models to the learning a human artist does when she experiences or studies works other than her own. In addition to being a reductive and dehumanizing analogy for the ways in which artists teach themselves a craft, this line of reasoning may be seen by the courts as smoke and mirrors. The factual question is whether the system retains a copy long enough to be perceived by the machine, which has been held to be violative of §106(1). Long-term storage of a copy is not required, and my understanding is that making a “more than fleeting” copy is unavoidable in any computer system—i.e., that there is no such thing as ingestion without reproduction.

Proving reproduction will be the whole ballgame insofar as litigation can address whether feeding a corpus of protected works is a violation of law. We shall see what the courts make of the facts presented, but without finding reproduction, the other copyright complaints likely fall. For instance, removal of CMI is not a stand-alone violation. Section 1202 of the DMCA states that removal is a violation if the party doing the removing knows or has reasonable grounds to know “that it will induce, enable, facilitate, or conceal an infringement of any right under this title.” Therefore, there must be a colorable claim of infringement for the CMI allegation to survive.

Derivative Works Allegations

Both the Andersen and Tremblay complaints allege that the AIs produce unlicensed derivative works in violation of §106(2), though the arguments are different in each case. In Andersen, the allegation arises from the premise that the system cannot produce anything outside the limitations of its data set composed of protected works. “The resulting image [output] is necessarily a derivative work, because it is generated exclusively from a combination of the conditioning data and the latent images, all of which are copies of copyrighted images.…a latent diffusion system…can never exceed the limitations of its Training Images.”

It’s an interesting theory, but I’m not sure anything in copyright law can support the argument that all potential outputs of the generative AI are unauthorized derivatives of the total corpus of works in the training set. To find an infringing derivative of a visual work (typically one image) requires a substantial similarity inquiry comparing a specific original with the follow-on work to determine what has been copied and whether that copying renders the second work a derivative of the first. This is difficult enough in the world of humans intentionally using a single visual work to produce a different visual work (see Goldsmith v. Warhol!!). So, it seems highly speculative to ask a court to find generally that billions of images output are, as a matter of law, derivatives of the billions of images input. I’m not certain the court has anywhere to look for guidance to consider this reading of the derivative works right.

If this derivative works theory is tough with images, it would be even harder with text—i.e., to allege that the textual outputs are derivatives of all the textual inputs is akin to saying that every book written is a derivative of every book read. This echoes a popular sentiment among the anti-copyright crowd that no work is “original,” a premise that should not be given any legal weight, even in the service of trying to protect creators from AI developers.

In Tremblay, the allegation is not that the individual outputs of ChatGPT are derivatives of the corpus of books used in training, but that the entire model is a single derivative work of its corpus. “Because the OpenAI Language Models cannot function without the expressive information extracted from Plaintiffs’ works (and others) and retained inside them, the OpenAI Language Models are themselves infringing derivative works, made without Plaintiffs’ permission and in violation of their exclusive rights under the Copyright Act,” the complaint states. [Emphasis added]

Again, claiming that the entire LLM is a single derivative work of the millions of literary works fed into the system would seem to strain the derivative works right beyond the limit where any court can venture. In fact, this allegation could potentially bolster the inevitable fair use defense the AI developers will be arguing—namely that the finding of “transformative use” in Google Books favors fair use of the corpus of work used in ML.

Fair Use & Google Books

Notably, these cases are brought in California, controlled by the Ninth Circuit and, therefore, not bound by the Second Circuit decision in Google Books, which many believe to be the strongest precedent favoring fair use for the AI developers. The comparison is a natural one. Google scanned whole books into a system to create a unique tool for searching the contents of books without providing any whole-copy substitutes for legally obtained copies. The court, noting that its decision “pushed the boundaries of fair use,” found under factor one that Google Books is “transformative” for its utility and found under factor four that it did not pose a threat to the market for the books used.

What the AI developers will try to argue under Google Books is that 1) their systems are highly “transformative” because they use protected works to create novel (even revolutionary) applications; and 2) their systems are designed to avoid outputting any copies that would serve as substitutes for the works in the data set. It is conceivable that courts or juries would find the comparison compelling, though the aforementioned capacity of a given AI to output Dr. Strange means that, unlike Google Books, the visual AI system at issue does make substitutes available and, therefore, the precedent is inapt.

By contrast, ChatGPT or other text-based application could have a stronger defense under Google Books if it is not possible, for instance, to have the system output an entire in-copyright literary work. The Tremblay complaint refers to the output of summaries, which is evidence that a whole book was ingested, but a summary is not generally an infringement and is certainly not a substitutional copy.

Meanwhile, other considerations should perhaps militate against finding fair use for generative AI model training. For instance, Google Books is a research tool for humans to learn about books written by other humans, including humans who write more books. Generative AIs are not necessarily comparable. For instance, Stable Diffusion does not provide a user with any information about an ingested work, and it poses an unprecedented threat to professional visual artists unlike any technology that has come before. Thus, the courts should consider the sui generis purpose of the generative AI at issue when citing Google Books or any other precedent to consider fair use.

In a May post, I proposed that unless the generative AI at issue can show that it promotes authorship, the court should decline to consider a fair use defense. To clarify, in Campbell, the Supreme Court states, “The fair use doctrine thus ‘permits [and requires] courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the very creativity which that law is designed to foster.”[4] Until generative AI changed the landscape, there was no need to affirm that “the very creativity” fostered by copyright means “human creativity.” But today, that distinction is necessary. Although generative AI can produce volumes of “creative” material, only those works which can be protected by copyright are works of authorship. And just like it is indecent to exploit an artist’s work to build a machine that might end her career, it would be absurd to allow fair use (a component of copyright law) to defend a technology that would potentially annihilate copyright’s purpose.

Of course, that’s one man’s opinion, and one that would apply to some, but not all, works derived by generative AI. As these tools develop, and their uses are explored by various types of creators, there are examples, both in practice and in theory, where we can find that generative AI does foster new authorship. This gets into the complicated question of copyrightability of works that humans create with some AI used in the process, and because this is itself a new discussion, it is difficult to say which generative AIs, if any, can be said to “promote the progress” of authorship as a matter of law.

Legal experts, both pro and anti-copyright, will comment upon the strengths and weaknesses of Andersen, Tremblay et al. represented by the one firm that has taken the lead on these lawsuits. But even where these cases may be flawed, they can provide some insight into the question posed by this essay: is copyright law an answer to the potential hazards of generative AI? I suspect that a fundamental difficulty arises because generative AI poses an existential threat to the future of authors, and some of the injustices and cultural calamities inherent to that threat may not be remedied (or entirely remedied) by the principles of copyright. Remedies sounding in other areas of law could loom larger, especially for certain types of creators, and that will be the subject of the next post.

[1] Deviant Art is also a named defendant being sued for breach of contract for providing works to Stability for ingestion.

[2] The same firm is now representing Sarah Silverman and another class of book authors, though the complaint is essentially the same as Tremblay.

[3] MAI Systems Corp. v. Peak Computer, Inc., 991 F.2d 511 (9th Cir. 1993).

[4] Citing Stewart v. Abend (1990).

Image by: idaakerblom

DCA Reports High Incidence of Credit Card Fraud on Pirate Sites

Digital Citizens Alliance (DCA) released a new report yesterday with the eye-popping statistic that 72% of Americans who subscribe to pirate media sites experience incidences of credit card fraud compared to 18% prevalence of credit card fraud among those who do not subscribe to pirate sites. These data are based on a survey of 2,030 Americans, of which 1 in 3 reported watching some pirated content in the last year, and 1 in 10 reported subscribing to a pirate streaming service. The report titled Giving Pirate Site Operators Credit states …

… piracy was once primarily a headache for content creators, users of these sites now face significant risks. Piracy subscription services make an estimated $1 billion a year providing services to at least nine million U.S. households.

DCA’s findings indicate that around 6.5 million Americans who choose to access movies, TV shows, and games in this black market, have been targeted for credit card fraud as a direct result of their subscriptions. And although I say the stat is “eye-popping,” given the environment we’re talking about, perhaps the real surprise is that the rate of unauthorized credit card charges in this network isn’t closer to 100%. After all, it’s one thing when hackers steal credit card data from legit retailers et al., but subscribing to a pirate site is cutting out the middleman and giving credit card info directly to a network of hackers.

The shift to high-quality streaming a little over ten years ago created an opportunity for pirates to launch new platforms offering low-price subscriptions to “everything” because, of course, none of the material they’re streaming is legally obtained but is stored on pirate servers around the world. Just as other DCA reports have shown that among the hidden costs of this all-you-can-eat offer is a high probability of infection with life-altering malware, the likelihood of unauthorized charges to a credit card is apparently even greater. “Combined with our previous research highlighting the risks associated with free piracy apps and services, the situation becomes even clearer. The pursuit of pirated content is an inherently risky behavior that threatens the devices, wallets, and privacy of consumers,” says DCA executive director Tom Galvin in a press release accompanying the new study.

DCA Research Subscriptions Trigger Fraud Within Eleven Days

Prior to conducting its survey of American consumers, DCA researchers subscribed to 20 pirate sites using a new credit card obtained for the experiment. In less than two weeks, the fraudulent charges began to appear from China, Singapore, Hong Kong, and Lithuania, and within three-months, DCA’s card was targeted with $1,495 in executed and attempted unauthorized transactions. The largest attempted transaction was $850, which was stopped by fraud protection, and the largest approved charge was $244.78. Given the implied cost to credit card services to provide protection against such transactions, DCA’s first recommended remedy—that the payment processors terminate relationships with known pirate sites—seems like a no-brainer.

DCA also recommends that the Federal Trade Commission “take piracy more seriously” and prioritize warning Americans about the risks associated with pirate sites; it recommends more consumer protection group outreach on this issue; and it recommends that law enforcement more aggressively investigate pirate site operators, now armed with the 2020 amendment to the U.S. Copyright Act which elevated large-scale piracy by means of streaming from a misdemeanor to a felony. “Given that the piracy ecosystem is now a $2 billion industry, the Department of Justice should use that authority to target piracy operators,” the report states.

Personally, I would be curious to know something about the thinking of 9 million Americans who want cheap media streaming so badly that they’re willing to tolerate the high risk of credit card fraud and/or a dangerous malware attack. Of course, to DCA’s point, perhaps the majority of these subscribers don’t know how risky accessing these sites can be.

Photo source by: Wichayada57844

The Illusion of More

Dissecting the digital utopia.

Category: Law & Policy