Public Knowledge Post on AI & Fair Use Misses the Mark

Patrick Gallaher at Public Knowledge recently posted an article about AI training with protected works, proposing to distinguish between piracy and fair use. Not to begin on a pedantic note, but the article is subtitled “Words Matter” because it claims that piracy is a provocative, non-legal term, so I have to respond by saying this is wrong. Although we think of “piracy” today as enterprises like The Pirate Bay, courts have often used the term “piracy” to mean “copyright infringement.” For instance, the seminal fair use case Folsom v. March (1841) uses the word thirteen times as in this quote:

“….it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.”

So, Gallaher is making a semantic fuss over nothing. If a contemporary court holds that AI training with protected works is copyright infringement, then this conduct may both legally and colloquially be called piracy.

As to the substance of the post, Gallaher asserts that AI training is inherently fair use, which is too broad a claim. The fair use doctrine defies generalization, and the facts in one case involving a particular AI and one type of work may have limited influence on the result of a case involving a different AI and different type of work. Or to put that another way, the incomplete fair use inquiry conducted in Bartz v. Anthropic, involving a class of literary works, likely predicts almost nothing about the eventual outcome in UMG et al. v. Udio or Disney et al. v. Midjourney, involving sound recordings and visual works respectively.

Gallaher states that AI training is transformative under fair use factor one (the purpose of the use). Indeed many articles of this nature rely on the assumption that this finding should be obvious and should carry the weight of the fair use analysis. “Copying for training is transformative: it uses the works for a fundamentally different purpose from the original, much like indexing websites for search engines or scanning books for text analysis,” he writes. And that’s all he writes about one of the most vexing doctrines in fair use weighing the most challenging technology ever confronted by copyright law.

Of course, even in one sentence, Gallaher manages to hide (or expose) the distinction that the purpose of many GAI products is to produce works without authors. This fact is highly distinguishable from the two analogies he cites and, as the courts will surely recognize, presents a novel challenge to the constitutional intent of copyright law. This is a consistent fallacy with every article of this nature—claiming that AI is the most revolutionary tech in history, but despite this novelty, we have ample case law to conclude that training is fair use.

Perhaps the courts will not wholly agree with my view that a purpose which does not serve the goals of copyright cannot favor fair use, but in Kadrey v. Meta, Judge Chhabria stated, “Courts can’t stick their heads in the sand to an obvious way that a new technology might severely harm the incentive to create, just because the issue has not come up before.”

Although that sentence prefaces a consideration of market dilution under factor four, the words “harm the incentive to create” allude directly to copyright’s core purpose and, so, implicates the purpose of the GAI to “create” in lieu of authors. And that goes to the question of transformativeness. So, no, it is not enough to say that a use which serves a different purpose is per se transformative, especially when that different purpose is to do exactly what creators do and, in the process, moot the utility of copyright law.

Notably, Gallaher masks the substitutional purpose of GAI by referring to it in general as technology that serves a “public good” and which provides “broad benefits.” The plain fact, though, is that we do not know this to be true. Simply because a product is new, being widely adopted, and/or has investors chomping at the bit is not evidence that its purpose is categorically beneficial. Far from it. We are already flooded with AI products causing serious harm, triggering liability claims for negligence and wrongful death, and launching emotional Senate hearings.

In this regard, I have argued that the courts have no factual basis for even defining the purpose of AI training. Although we should not talk about AI as a monolith, the counterpoint to that principle is that it’s generally the same process ingesting the same creative works, whether the AI product is used for scientific research, military applications, medical diagnosis, CSAM, social engineering attacks, or addicting children to establish dangerous “friendships” with machines.

Even if the courts are unwilling to apply such a broad sweep of uncertainty in a copyright context, it is sufficient to say that we have little reason to assume that AI is generally beneficial in the world of creative and cultural production. And whether the folks at Public Knowledge know it, the courts are at liberty to look beyond the four factors in weighing fair use, especially when they are presented with considerations that have little or no precedent.

It is important to keep in mind that on fair use factor one, the often unwieldy transformative doctrine splits into two distinct branches of case law. The traditional purpose of fair use, dating back to English courts, is to allow new creative expression to flourish, particularly expression that comments upon the work being used. Fair use cases of this nature most often address one user of one work for one clear purpose.

The more contemporary branch of factor one considerations entails mass use of protected works for a technological purpose that can strain against the fair use doctrine. Simply put, fair use was not developed or codified into statute to provide raw materials for technological products, and as discussed in other posts, when the Second Circuit allowed scanning millions of books for Google Books, it stated that the case “tests the boundaries of fair use.” GAI products, whether used for good or ill, lie well outside those boundaries.

Articles like Gallaher’s are not really making a copyright argument but are instead drawing readers to conclude that copyright owners should be required to subsidize AI development whether they like it or not. Other than assuming that Public Knowledge is still a PR firm for Big Tech, I don’t know why an organization with that name takes such a position when countless parents, educators, artists, lawmakers, and medical experts are insisting upon guardrails and oversight for AI in recognition of social harm already being done. This same sober approach must apply to copyright rights and, at the very least, foster a licensing regime that avoids undermining foundational IP principles.

Image source: H9images

Librarian Panel Advocates Naïve Changes to Copyright Law

In April 1787, as James Madison was limbering up his philosophical muscles ahead of the Constitutional Convention, Thomas Jefferson shipped him several crates from Paris filled with books comprising what one might call the Enlightenment in a Box. I mention this footnote of American history only to observe that every book Madison received—indeed every book that ever influenced an American Framer—is in the public domain, and, thanks to the digital age, more widely and affordably available than at any point in the history of Western civilization. Additionally, millions of works produced between 1789 and the Copyright Act of 1909 are likewise in the public domain and, if these have survived in some form, they are also likely available in various digital archives. And the list goes on.

Yet despite this extraordinary age of access—an era some would reasonably compare to the proliferation of the press—ignorance is in no short supply in the democratic world. Indeed, a highly creative form of ignorance—the conspiracy theory—seems to be galloping without rest along the “information superhighway,” and it remains to be seen whether Hell follows with its multitude of riders. All of which is to affirm what should be obvious even to a casual observer: that more access to information is not the antidote to misinformation.

Nevertheless, on March 24, Public Knowledge hosted an online event that was ostensibly aimed at combatting both misinformation and injustice. And not at all surprising, the substance of the panel discussion alleged that the bugbear preventing the misinformed from becoming the informed is copyright law. Not so subtly titled Burying Information – Big Tech & Access to Information, one promotional tweet about the event read:

This powerhouse panel will discuss fighting #misinformation w/information through tools like #CDL, & how technologist (sic) can create inclusive, empowering tools to provide access to information for disadvantaged & marginalized communities.

The powerhouse included Brewster Kahle, founder of the Internet Archive; Michelle Wu, author of a concept called Controlled Digital Lending (CDL), Heather Joseph, Executive Director at SPARC;[1] and moderator Amanda Levandowski, professor at Georgetown Law Center.

The discussion led off with brief remarks by Senator Ron Wyden, who expressed his love for libraries, his belief that more good information is the cure for disinformation, and his view that copyright needs to change in order to provide equitable access for all Americans to the aforementioned good information. It was probably not a coincidence that the event was held on the one-year anniversary of the Internet Archive launching what it called the National Emergency Library (NEL), for which it is now being sued by four major publishers.

Controlled Digital Lending

CDL, the central topic of the conversation, is a legal theory (emphasis on theory) asserting that libraries should be allowed to scan the physical books they own and then loan digital copies, one consumer at a time, per each physical copy they have in their collections. So, if a library has four physical copies of a book, it can loan up to four at a time in any combination (e.g. four digital or one digital and three physical, and so on).

The two main rationales presented for CDL are, first, that digitizing a physical collection preserves the collection and makes it accessible in an emergency—Wu conceived the idea when the library where she worked was flooded—and second, at least according to the panel, the cost of licensing eBooks from publishers is too high and, therefore, makes poor use of libraries’ limited resources. The “publishers won’t sell eBooks, but will only license them,” the panel unanimously complains, and further asserted that the unreasonably high cost of licensing results in a reduction of diversity in material and limited access for the most vulnerable members of society.

If the preservation argument for digitizing a collection sounds reasonable, it is. And that’s why Section 108 of the copyright law already provides a carveout for libraries to digitize books for preservation purposes. So, if libraries are not doing this, it isn’t because the law prohibits it. Relatedly, digitizing books costs money, and to my knowledge, there is one major enterprise in the business of digitizing books for libraries. It’s called the Internet Archive. Just sayin’.

As for the argument that CDL is a necessary workaround to the publishers’ “extortionate” eBook licensing regimes, this complaint rings a little hollow, and I would love to see hard data to support that claim. I access a mid-size library system that loans eBooks and filmed entertainment though third-party licensing vendors, and the system itself does not appear to be failing or suffering more than the usual ups and downs experienced by libraries.

But more telling perhaps is that the overall tone of the panel conveyed a resentment toward licensing eBooks at any price. Indeed, the group was unanimous in describing the codification of CDL into law as a “first step” toward more substantial, and ongoing, amendment to copyright. Or if Brewster Kahle had his way, the abrogation of copyright altogether. He is an anti-copyright ideologue, who alleged during the event that the lawsuit publishers filed against the Internet Archive was an effort to kill the concept of CDL in the proverbial cradle, but he left out the fact that what triggered the litigation was IA’s decision to make 1.3 million books available without controls of any kind.

More importantly, as Michelle Wu proclaimed, encoding CDL into law should be considered a step toward amending §109 of the copyright act to encompass “digital first sale,” which happens to be a market-devastating proposal for a lot more than books (see posts here and here). Suffice to say that encompassing “digital first sale” into the copyright law—a proposal which has been rejected by Congress and the USCO after about twenty years of advocacy, by the way—would thrash the market for authors of creative works, who have already seen revenues dry up due to multiple effects of digital technologies and industry practices.

More Information is Not the Antidote to Misinformation

I too love libraries. I agree with Heather Joseph’s comment that everyone who appreciates what these institutions do has a love my librarian story. But I got the sense from some of the rhetoric in the discussion that librarians may be feeling a bit ignored (i.e. less relevant) in the digital age; and if that is correct, the focus on copyright and the major book publishers is a misguided response. Some statistics indicate that reading among Americans has been trending downward for years.[2] One source tells us that Millennials read more than any other generation, but both they and the Boomers substantially prefer print books to eBooks.[3] So, what does that tell us about the urgent need for CDL? I don’t know either, but the point is that it is not sufficient to allude to a “problem” without evidence when seeking a legislative “fix.”

Meanwhile, anyone who says that reading materials overall are too expensive (and therefore copyright must change) is simply ignoring evidence. The cost of new book buying is roughly on par with the cost of new book buying in previous decades. And access to eBooks, used books, and borrowed books is clearly greater than the pre-digital age. I will also give credit to Internet Archive and its sister organizations for making older works in the public domain accessible.

So, a mutual love of libraries is where my agreement with this panel ends—especially with regard to the underlying thesis that the disinformation crisis now rampaging through democratic societies like a (well, a pandemic) can be cured with greater access to reading material. No, it cannot. Speaking as a lifelong liberal elitist, that assumption is liberal elitist hogwash that has been soundly rejected by evidence, and which, ironically enough, belies a failure by this panel and its constituents to allow evidence to influence their own biases.

We must acknowledge that the plague of toxic misinformation in the United States (e.g. QAnon, antivax, stop-the-steal, etc.) almost exclusively infects the privileged. The folks who believe and spread some of the most Republic-shattering nonsense are generally upper middle-class white people with plenty of access and way too much time on their hands. Many even have college degrees, but a lot of them are the people I see in my community—like the contractor, who makes considerably more money than the average book author, but he neither spends that money on reading material nor spends his time seeking “good” information.

We should be careful about implying that there is a correlation between susceptibility to disinformation and economic precarity, or other imbalances of justice. And Senator Wyden should really think twice about whether he endorses that view without data to support it. Because I think the empirical evidence suggests that privilege plus internet are the two main ingredients for producing some dangerously ignorant people. After all, it was not cash-strapped families who had the time and money to travel to D.C. on a Wednesday to engage in a little insurrection tourism.

So, I hope the powerhouse panel does not literally believe that the folks who assaulted police officers with flag poles and bear spray (and more broadly those who endorse that conduct) would feel different if only they had better access to Aristotle and Voltaire. Because, as noted, they do have access. We all have greater access to the entire Western canon than we have had at any time in history. Yet, this access does not appear to be mitigating “the rise of authoritarianism,” as Sen. Wyden noted in his introduction. An adage about horses and water comes to mind.

The implication that one must be wealthy to afford access to books—or that the wealthy necessarily read—is a false generalization. It also happens to distract attention from the more pressing problem that the most economically disadvantaged households do not generally own the electronic devices needed to tap into the bounty of digital material the panel thinks should be more accessible.[4]

Yet, Kahle insists we must fulfill the “original” dream of the internet to foster a “new Library of Alexandria,” and he denounces copyright as the obstacle to achieving that end. It’s a bittersweet reference to say the least. In case he and the panel haven’t noticed, a cold civil war in America has already lit the library (metaphorically) on fire. Competing realities is the new reality. And that ain’t copyright’s fault.

The implication that copyright makes society ill-informed is not only contradicted by a litany of counterfactuals, but pursuing a legislative agenda based on this premise would only make the misinformation problem worse. For one thing, despite the disciplined use of the word information by this panel, and other adherents to their views, the copyright revisions they advocate would affect all works, a vast majority of which are not informative per se.[5]

Informative works, mainly nonfiction books, are written by a number of authors who do not make substantial returns on their enormous investments of labor and skill. For every Chernow who breaks through, there are hundreds of authors writing detailed histories that, despite their significance—some even win Pulitzer Prizes—do not easily compete with thrillers, tell-alls, or even literary fiction. Divest these historians and biographers of their copyrights, and they will not write these books at all. You don’t need to burn a library; you can simply starve the authors of the incentive to publish.

Moreover, the librarians’ agenda to change copyright law is myopic, even to the extent that it betrays their unique role in the publishing/consumer ecosystem. They consistently fail to recognize that changes in the copyright statute apply to all categories of works and would be exploited by commercial interests that would not only harm creators, but could also degrade the relevance of libraries. “Digital first sale,” for example, would have made the business venture ReDigi a lot of money, creating an ersatz secondary market that would damage the primary market for music, but that kind of model would also limit or obviate the need for libraries to loan works.

If “digital first sale” were the law, libraries could spend their resources digitizing all the books they want and not hope to compete with a commercial venture that conducts P2P transactions in “used” eBooks. In that paradigm, publishers are harmed, authors are harmed, and libraries may be starved of revenue and/or constituents who need their services. There are many reasons why “digital first sale” has been consistently shot down over the years.

So, as I have proposed before, where librarians see new difficulties fulfilling their mission in the contemporary market, they should endeavor to be specific, and also account for factors that have nothing to do with copyright law. Library carveouts already exist in the statutes, and if adjustments to these exceptions can be proposed to serve a clear purpose for libraries alone, then let those arguments be made.

But in the meantime, the librarians (love them though I do) should resist the sweeping declarations that the fate of democracy (i.e. information) rests in their hands. In the futile effort to make more books available for people who won’t read them, let us not deprive the market of new books for those who will. And if Senator Wyden and his cohort are genuinely concerned about misinformation tearing apart democratic institutions, they can find much better projects than stripping American authors of their rights.

[1] Scholarly Publishing and Academic Resources Coalition

[2] “On average, Americans aged 20 to 34 spend a mere 0.11 hours reading daily, which amounts to less than seven minutes per day.” https://www.statista.com/topics/3928/reading-habits-in-the-us/

[3] https://ebookfriendly.com/comparing-reading-habits-five-generations-infographic/

[4] “Roughly three-in-ten adults with household incomes below $30,000 a year (29%) don’t own a smartphone. More than four-in-ten don’t have home broadband services (44%) or a traditional computer (46%). And a majority of lower-income Americans are not tablet owners.” Pew Research Center.

[5] The libraries’ own data reveals that circulation of nonfiction works is roughly 46% nationally, and of that 82% are cookbooks. Library Research Service.

EFF, Public Knowledge, et al Celebrate Defeat of SOPA/PIPA Out of the Blue

Rumors have come to my attention—okay it was splashed all over Twitter—that an event was held yesterday called The Untold Story of SOPA/PIPA. “Defeating SOPA/PIPA didn’t happen overnight,” says the EFF’s promotional page for the event. “Advocacy groups like Public Knowledge fought long and hard for years to raise the alarms about these censorship efforts.”

Where does one begin? By commenting on the offensive or the pathetic? Perhaps the most poignant and direct offense speaks for itself. Because just this morning, I happened to see the following post by a Facebook musician friend:

So our new album, which was just released Monday and cost us tens of thousands of dollars to make and promote (which was borrowed), is already on “file sharing” sites…

Online piracy, including by foreign actors, even almost a decade since the great defeat of SOPA/PIPA, is still a major problem that still costs thousands of independent creators their livelihoods. But don’t let that spoil the party being thrown by a bunch of ivory-tower “activists,” who were in the trenches in 2011 working their index fingers raw, Tweeting and sharing batshit crazy memes and other disinformation about those bills. Or don’t forget to say a prayer for the digital-age powder monkeys of 4Chan who helped spread the word. And as for investments! Well, what about the money (whose money?) spent on SPAM bots to spread the word that SOPA/PIPA would break the internet? Sock puppets have to eat, too, y’know! (Actually, no I guess they don’t.)

The tragedy is that the real “untold story of SOPA/PIPA” is that the public was lied to about how those bills actually worked; lied to that the bills’ opponents “were all for stopping piracy, but not this way;” and lied to about how organic and grassroots the effort was to defeat the bills. Does anyone today actually believe it was a coincidence that the Internet Association was founded concurrently with the fight against that legislation, or that Google’s lobbying expenditures went from negligible to record-setting during the same period?

Stop SOPA was one of the most successful and well-funded disinformation campaigns in internet history and, as I have said many times, it scared the hell out of me. And not because of the piracy problem. That was just an unfortunate failure for people like my friend quoted above. No, the scary part about the manner in which the legislation was defeated were the lessons the campaign taught to other powerful institutions. It was clear by the mechanisms employed that anyone with enough money could alter the course of history with a few simple lies and mediocre graphic design. I know, right? What was I thinking? That rampant disinformation might threaten democracy itself? Just my hysterical nature, I guess. Because let’s be clear: SOPA/PIPA was not defeated with information or, heaven forbid, debate in Washington. Those bills were defeated by this:

I come from an advertising and marketing background, and that right there is advertising. Very effective advertising. Plenty of my friends shared memes like this one for weeks leading up to the defeat of SOPA/PIPA. But when advertising is designed to frighten the consumer, it should be confronted with skepticism—critical thinking that social media seems especially well designed to weaken among users. How many of my friends read or had the background to understand the legislation? Almost none.

And, yeah, I know. There were articles written about those bills, too. And you could hardly see the puppet strings of collusion despite the uncanny consistency in the language being used—generalized, ominous, and populist, without bothering to mention that the key mechanisms proposed already existed in the law. Like the tweak to injunctive power against foreign piracy sites, which would not have had any effect on the ordinary function of internet activity. And since 2012, SOPA-like enforcement measures (e.g. site blocking) have been implemented in markets around the world, and still no breaking of the internet has occurred.

But I think the most galling aspect about this sad attempt to relive the glory day of January 18, 2012 (you probably forgot, right?) is that nothing about the Republic-shattering events of the last several years has chastened the “free speech” rhetoric of the EFF, Public Knowledge, Sen. Wyden, et al. That they are still eager to call SOPA/PIPA “censorship bills” with straight faces is astounding. Never mind that piracy is not a form of protected speech; but have these organizations learned nothing since 2016? Did they miss the giant sticky note that says the laissez-faire approach to platform governance has been an abysmal failure worldwide? Specifically, do they lack the introspection to recognize the methodological similarity between …

this …

… and this?

If Russian troll farms didn’t read the Stop SOPA Playbook as the ultimate guide to manipulation through social media, they certainly could have. But, again, don’t let events like the U.S. Capitol assault of January 6^th ruin all the self-congratulatory fun being had at EFF and Public Knowledge. Though I do have to ask why March 17, 2021? Why the nine year and two-month anniversary of the defeat of SOPA/PIPA? Odd no? Maybe not. Are EFF and PK trying to send a signal to the IP Subcommittee that if it tries to update the failed notice-and-takedown provisions of the DMCA, they will unleash Godzilla once again? Can’t say for sure. Maybe they just couldn’t get hold of any St. Patrick’s decorations and decided to have a different kind of party.

The Illusion of More

Dissecting the digital utopia.

Tag: Public Knowledge