Big Tech Tells Trump Admin that Copyright is a Barrier to AI Development

copyright

Last week, in response to the Executive Order referred to as the “AI Action Plan,” various stakeholders submitted comments to the Office of Science and Technology Policy (OSTP). OpenAI, for its part, submitted one of the finest examples of tech-bro bombast we have seen in some time. Not even Google’s comments, which names copyright, privacy, and patents as barriers to AI development, comes close to OpenAI for serving up so much high-octane, tech-utopian gibberish, including this gem in the preamble:

As our CEO Sam Altman has written, we are at the doorstep of the next leap in prosperity: the Intelligence Age. But we must ensure that people have freedom of intelligence, by which we mean the freedom to access and benefit from AGI, protected from both autocratic powers that would take people’s freedoms away, and layers of laws and bureaucracy that would prevent our realizing them.

Fewer than half of all Americans trust either the current administration or Big Tech when it comes to “freedoms” or “intelligence,” but does anyone believe that AI development inexorably leads to the kind of prosperity OpenAI projects in its comments? Like most technologies, AI can be used for good or evil. In theory, it can be used to diagnose and treat disease, but in practice, it could be used to “solve” disease by more efficiently automating denial of treatment. It can be used to enhance or improve productive work, but it might be used to shed jobs across multiple sectors without considering the implications of doing so.

“Innovation” is a meaningless word until it is defined by the values and principles of the innovators and/or the government with which the industry partners. In OpenAI’s effort to distinguish American AI development from that of the People’s Republic of China (PRC), it recommends, at least in its comments on copyright, that we should emulate the anti-democratic, piratical conduct of this adversary. It even goes so far as to allege without foundation that machine learning (ML) with unlicensed copyrighted works is a matter of national security.

Under the heading “Freedom to Learn,” OpenAI’s comments about copyright—especially the emphasis on fair use doctrine—are incoherent to the point that one wonders whom the company is addressing. But before speculating about that question, here are a few quotes with responses:

American copyright law, including the longstanding fair use doctrine, protects the transformative uses of existing works, ensuring that innovators have a balanced and predictable framework for experimentation and entrepreneurship.

The judge-made fair use doctrine applies a four-factor test, of which one part of the first factor considers whether a “transformative use” has been made of a protected work. There is no direct precedent applicable to mass copying of creative works for the purpose of ML to build artificial intelligence, which is why about thirty active lawsuits present this novel question to the courts. Further, because fair use is a case-by-case, affirmative defense to a claim of infringement, it defies the “predictable frameworks,” for which OpenAI claims to be asking.

This approach has underpinned American success through earlier phases of technological progress and is even more critical to continued American leadership on AI in the wake of recent events in the PRC.

This says, “American innovation is great, but the Chinese kicked our asses with DeepSeek, and we’re grumpy about it.” Kudos to OpenAI for playing to the audience, but it is incoherent as a statement about the fair use defense “underpinning American success.” The core copyright industries account for an estimated 7.66% of U.S. GDP and this proven prosperity should not be radically disturbed for the sake of undefined “innovation,” some of which will inevitably flop.

As for history, American copyright law has typically adapted to technological change by ensuring the protection of authors’ rights from the exigencies of technology developers. In the best cases, this fosters a symbiotic relationship between new technology and creators, but that is not what OpenAI advocates here. Instead, OpenAI says, “American creators be damned. AI is too important to worry about their rights.”

OpenAI’s models are trained to not replicate works for consumption by the public. Instead, they learn from the works and extract patterns, linguistic structures, and contextual insights. This means our AI model training aligns with the core objectives of copyright and the fair use doctrine, using existing works to create something wholly new and different without eroding the commercial value of those existing works.

This attempt to litigate questions of fact and law in comments to the OSTP is as contradictory as it is misplaced. First, it asserts that OpenAI’s ML process does not violate any copyright rights and is, therefore, non-infringing. But that assertion conflicts with the inapt argument that their ML is exempted under factors one and four of the fair use test. Where there is no basis for a claim of infringement, there is no rationale for arguing a fair use defense.

Applying the fair use doctrine to AI is not only a matter of American competitiveness—it’s a matter of national security. Given concerted state support for critical industries and infrastructure projects, there’s little doubt that the PRC’s AI developers will enjoy unfettered access to data—including copyrighted data—that will improve their models. If the PRC’s developers have unfettered access to data and American companies are left without fair use access, the race for AI is effectively over.

Here, OpenAI argues that American policy should emulate the PRC by disregarding the rights of creators, thereby, disqualifying any claim by Altman & Co. to promote democratic values. Further, OpenAI not only invents the term “fair use access” but then erroneously implies that U.S. national security operations need the “freedom to learn” from unlicensed creative works in order to do their jobs.

For Whose Eyes?

The combination of misstatements and emphasis on fair use prompts the question as to what policy OpenAI hopes to achieve. If OpenAI et al. want a statutory exception for ML, any rational petition to Congress for that change to the Copyright Act would not address fair use or suggest amendment to that part of the statute. Instead, we must assume that this message is aimed at the courts, who will decide whether and to what extent ML is exempted by fair use, including in cases where OpenAI is a defendant.

Presumably, one hope is to say the words “national security” enough times that 1) some party in the administration echoes this talking point; and/or b) the courts feel reluctant to rule against AI developers on copyright infringement claims. In either case, AI is not one product. Development of security-related products or AI agents for the intelligence community does not rely upon the development of those generative AI models that are built substantially on ingesting millions of creative works without license for the purpose of producing artificial “creative” works.

More broadly, it is a tad rich to say that copyright rights are a barrier in the AI arms race while DOGE is assigned to hack its way through educational funding and shed experts in nearly every field. If America loses to China in this contest, it will most likely be attributable to our national retreat from excellence and fostering a culture where people refuse to see the difference between a Ford F-150 and a plastic piece of shit. If that’s the kind of public/private environment in which Americans are going to develop AI, don’t blame the artists and their copyright rights when it fails.


Photo by pylypchukinnastock358

Are Creators Aligned on Artificial Intelligence?

creators

One of many challenges with adoption of generative AI (GAI) tools is whether creators are willing to demonstrate a degree of solidarity on the matter—i.e., apply the principle we generally call fair trade. If Creator A uses a GAI that might be harmful to Creator B in a different field, and so on, will most creators take this broader perspective in a group effort to demand ethical uses of GAI?  Moreover, this question becomes intertwined with copyright because the use of GAI is a subject of evolving legal doctrine, meaning that creators who want to produce commercial content outside their core talents should be aware that the material produced may not be protectable under the law.

Two simple examples would be the self-published book author who might use an AI voice app to produce an audiobook, and the documentary filmmaker who might use an AI music generator to produce a soundtrack for a film. In both examples, creators in other fields—voice actors and composers respectively—are potentially harmed by the development and use of these AI tools, but 1) will the author and filmmaker take that consideration into account?; and 2) will the sound recordings in either case be protected by copyright?

In the case of the author using AI in lieu of hiring a narrator to produce the audiobook, I predict that under current doctrine, the sound recording would not be protected by copyright law because there is no human performance captured in that recording. Thus, remedies for any piracy of the audiobook would rely solely on the protection of the underlying literary work, which is effective—but if the sound recording is also protected and registered, that would be two works infringed instead of one.

This increases the potential damages for infringement, which puts the author/owner in a stronger position if she needs to take legal action. By this example, authors’ interests may be seen as aligned with those of professional book narrators. Hiring a narrator will not only achieve better quality in the reading, but capturing the human performance is also a basis for copyright attaching to the sound recording.

Similar considerations would apply to the filmmaker with the GAI soundtrack, although there may be other factors that provide the AI music with some protection we don’t find with the AI audiobook. One factor that may become relevant is whether the filmmaker can show that he exerted sufficient creative control over the final sounds. If so, he may be able to defend a claim of copyright in the soundtrack, but we are likely several years and a few lawsuits away from clear guidance on this question.

Another consideration with the soundtrack may be the Copyright Office’s current view that material using assistive AI “within a larger work” is protected. Creators should be careful about interpreting that broad language because constituent works that stand alone—and this would apply to a soundtrack for a film—would logically not be independently protected.

Of course, there are many GAI products that allow one type of creator to avoid hiring another type of creator for a given project. Some of this is inevitable, and it is not necessarily unethical or bad for creative culture. That said, even with ethically trained and ethically used AI tools, the copyright considerations should be weighed by the individual creator (i.e., do they care about protecting what might not be protectable?), but also collectively by all creators contributing to a new ecosystem.

Since 1978 in the U.S., the default is automatic copyright protection, even if most rights are never enforced. But as GAI is used to produce a lot of material that is not protected, it is hard to predict what effect this might have on copyright overall. Even older than automatic copyright with the 1976 Act, the human authorship principle fosters a new tension for creators who may wish to combine GAI and human-authored work. As a response to that tension, it would be a mistake in my view to overwrite the “human spark” doctrine and simply protect any material that “walks and talks” like a creative work. This isn’t just an emotional appeal to anthropocentrism but rather a conviction that copyright would become meaningless—even unconstitutional—by eroding the incentive rationale for its existence.

Regardless of the theoretical questions addressed in this post, I believe that as a practical matter, creators should think carefully about how and when to use GAI for various projects. As an ethical consideration, perhaps if you’re opposed to “scraping” in your industry, then opposing it in others is the right view to take. But as a business consideration, if what you’re making is meant to have commercial value, AI-generated might mean not protected by copyright—and that means even if you spend money and time on it, it isn’t yours.

Guarantee of Confusion: When AI Scrapes the News

news

That title riffs on the term of art in trademark law known as “likelihood of confusion.” It refers to a foundational test, which asks whether the average consumer will confuse a particular mark (words, design, or both) with a product or service that is not produced or distributed by the company associated with a known mark. Thus, beware the Rollex, Tilynol, or even the KleanEx. But when a real trademark is used to promote a defective product, confusion is certain—especially when the brand is a news producer.

In a lawsuit filed today by several major news publishers against an AI developer (Advance Local Media et al. v. Cohere Inc.), we see a good example of copyright and trademark combining to serve the public interest in contrast to the extensive harm that can be done by technology developers running roughshod over IP rights. Copyright incentivizes the investment in professional journalism needed to report reliable news, and trademark identifies the source of the news we choose to trust. I know readers will be inclined these days to criticize one news organization or another, but hold that thought.

The complaint filed in the District Court for the Southern District of New York names as plaintiffs several well-known news publishers (e.g., Condé Nast, Los Angeles Times, The Guardian) who allege that AI developer Cohere is liable for both copyright and trademark infringement. Valued at $5.5 billion, “Cohere’s primary product is its suite of LLMs referred to as the Command Family of models…these LLMs are trained on vast amounts of text and as a result can generate text-based, natural language responses to user queries,” the complaint states.

The Copyright Allegations

On copyright infringement, the publishers intend to show that Cohere violates their exclusive rights both when it inputs protected works to train the Command products and when it outputs verbatim or substantially similar works that are reproduced, distributed, and displayed to paying customers. The two counts of alleged trademark infringement stem from use of the publishers’ registered names in conjunction with erroneous material that may be “hallucinated” by the LLM. Clearly, anyone can recognize why this would be harmful to the reputation of the named source and broadly harmful to consumers who already struggle to validate information in this miasma we call the internet.

Notably, the Publishers stress the fact that Cohere markets itself on the reliability and timeliness of the information Command provides—benefits that would be essential for its many commercial customers, but which the company allegedly chose to accomplish through unlicensed use of the works produced by news organizations. “Cohere relies heavily on trusted journalism sources to shore up the authority of its responses. As Cohere’s CEO Aidan Gomez explained in a letter to employees and shareholders, Cohere believes that a ‘key differentiator’ for its models is the ability to receive ‘verifiable answers,” the complaint states.

Further, to support the veracity of query results, Cohere relies on “retrieval augmented generation” (RAG), which an NVIDIA blog post describes thus: “Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers — grounded in specific court proceedings or similar ones — the model needs to be provided that information.” This case law analogy is ironic in context because even at this very early stage, the copyright case law strongly suggests to this observer that Cohere should not have chosen the unlicensed path to build its products.

For example, a description from the complaint reminds me that the news summary product TV Eyes was held to be infringing on less compelling evidence than the following:  “The user can expand [the] Under the Hood [tool] to view the exact underlying documents on which Cohere relied to generate the response. Cohere refers to these sources as ‘snippets,’ but to be clear—these ‘snippets’ are generally the full text of every source on which the output was based.”

In fact, the allegations in this complaint imply so much familiar ground that it is hard to imagine how Cohere will raise a persuasive defense. For instance, just this week, I summarized the Delaware District Court finding that comparatively limited copying of Westlaw’s headnotes for an AI search product was considered a market substitute for the protected works. What Cohere is allegedly doing with news articles is similar in purpose but entails far more extensive, unlicensed use of substantially more protected expression than in Thomson Reuters v. Ross.

The Trademark Allegations

With the RAG tool switched on, Command will apparently provide reliable news by copying, distributing, and displaying unlicensed copies of Publishers’ works. But with RAG switched off, its LLM might hallucinate and then attribute the resulting misinformation to one of the named plaintiffs. For instance, the complaint cites a Cohere “article” that confuses the 2023 massacre at the Nova Music Festival with a 2020 shooting in Nova Scotia; reports that a man murdered at the latter “returns to the scene” of the former; and then attributes this whole mess to The Guardian.

The Publishers allege that Cohere violates two counts of the Lanham Act—trademark infringement and false designation of origin—both of which seem highly plausible based on the facts presented. We shall see whether Cohere can present compelling facts to rebut the allegations, but otherwise, as to the questions of law in this case, I predict this one easily goes to the plaintiffs.

As mentioned above, I know some readers may scoff at the premise that quality journalism is consistently the hallmark of well-established news publishers today. And to be sure, one must occasionally check the math in various articles and editorials. But I maintain that Big Tech, through its predatory model of monetizing everything it does not create—plus our willingness to believe utter nonsense online—exerts a pressure on professional journalism that borders on an existential threat. Left unchecked, the AI shenanigans like those described in this lawsuit do more than violate IP law; they undermine the efforts of any reporter who is still trying to present reality.


Photo by AndreyPopov