Careless People: The Book Meta Doesn’t Want You to Read

careless people

Careless People by Sarah Wynn-Williams could almost be one of Christopher Buckley’s Beltway satires. Like Thank You for Smoking or The White House Mess, the first-person protagonist takes the reader on a journey from dream job to absurd nightmare—each chapter an ironic critique of the powerful characters depicted. Except Wynn-Williams is real, and so are the truly awful people and events she describes. “…like watching a bunch of fourteen-year-olds who’ve been given superpowers and an ungodly amount of money, as they jet around the world to figure out what power has bought and brought them,” she writes in the prologue.

The subtitle, A Cautionary Tale of Power, Greed, and Lost Idealism aptly describes this memoir, which begins with Wynn-Williams’s story of surviving a shark attack at the age of 13 in her native New Zealand and ends with her being escorted by security from the shark-infested headquarters at Facebook twenty-five years later. Hired in 2011 as the Manager of Global Public Policy, Wynn-Williams conveys her initial enthusiasm as a true believer in the power of Facebook to be a force for good and, on that basis, how she pitched the idea of a policy role for herself at a time when the leadership did not yet grasp why the company would need to build relationships with state leaders.

Initially, Wynn-Williams’s expertise as a former New Zealand diplomat reads like a satirical counterpoint to the fumbles of tech-nerds who don’t understand state craft. An early chapter, for instance, describes the visit of German delegates to Facebook’s Washington office and their bewilderment upon seeing the open-plan office with all the facades stripped away to expose the ducts and bare fixtures to “symbolize” the company’s nascent status. “‘You dismantled the furnishings of a proper office to make it look like this? Like it is under construction?’ one of the officials inquired, incredulous,” Wynn-Williams writes.

This image of the deadpan German thinking he is meeting with unserious people would be funny if not for the very real and deadly events that are indeed foreshadowed. As the narrative unfolds like a thriller, the protagonist discovers unbounded arrogance, callousness, hypocrisy—and ultimately—dangerous and criminal conduct among her superiors. The faux feminism of Sheryl Sandberg and lechery of Joel Kaplan become subplots about elite executives whose worst crime against humanity, so far, is arguably Facebook’s role in fostering rampant hate-speech which fueled the Rohingya genocide in Myanmar between October 2016 and January 2017.

As discussed in an earlier post, Senators Hawley et al., motivated in part by Wynn-Williams’s testimony and accounts in the book, have stated an intent to investigate Facebook’s misconduct designed to appease the Chinese Communist Party. But to me, the most compelling part of the memoir is the glimpse into Mark Zuckerberg’s character, especially as a putative oligarch in context to the Trump-led assault on the constitutional order of the United States.

Wynn-Williams’s portrait of Zuckerberg, an avatar of Big Tech leaders, combines the patriarchal vanity of John Galt with the innocent savagery of Jack Meridew—a boy billionaire, who plays board games that his staff let him win, but who ultimately embraces the destructive power he controls. Specifically, the chapters describing Zuckerberg’s psychological process upon learning that Facebook was catalytic to the 2016 election of Donald Trump can be described as denial, anger, pride, and corruption.

During a flight on the private jet to Lima for the Asia-Pacific Economic Cooperation (APEC) summit, Elliot Schrage, VP of global communications, marketing, and public policy, explains to Zuckerberg how, “A Trump operative named Brad Parscale ran the operation together with embedded Facebook staff, and he basically invented a new way for a political campaign to shitpost its way to the White House, targeting voters with misinformation, inflammatory posts, and fundraising messages,” Wynn-Williams writes.

Initially, Zuckerberg clings to the belief that his platform is a neutral conduit for free speech and “connecting people,” but he then becomes angry at the irrefutable evidence presented by Shrage. Then, at the APEC summit, Zuckerberg’s incipient sense of his own power, and test of his character, is described by Wynn-Williams as he is buffeted between foreign leaders kissing his ass one minute and President Obama in a side meeting lecturing him about the dangers of misinformation on Facebook.

Rather than introspection, Zuckerberg responds like a petulant comic book villain—so offended by the criticism of the U.S. President that he decides to use the power of his technology for his own run at the office. “After all, not only does Mark now have Trump’s playbook, he owns the tools and sets the rules,” Wynn-Williams writes. “And he has something no one else has, the ability to control the algorithm with zero transparency or oversight.”

Again, the image of the staff reacting to Zuckerberg’s announcement that he wants to hold events in swing states like Iowa, New Hampshire, Pennsylvania et al. would make great satire but for the fact that, as Wynn-Williams puts it, “He could run for president and not ask anyone for a dime.”

Of course, the real point is not the prospect of President Zuckerberg—at least not yet—but rather Wynn-Williams’s courageous exposure of the mindset behind the allegedly “greatest tool for democracy ever invented.” And she does so at tremendous personal risk–threatened by Meta, which tried to stop publication of the book, tried to stop her testifying before Congress this month, and threatens to sue her for $50,000 per negative comment about the company.

In many ways, Careless People reveals what many of us already knew about Meta and the other social media giants—at least since 2017:  that they are not designed or operated according to principles that ever justified the populist rhetoric of “democratization.” That was a lie more than a decade ago, and the lie is exponential in the battle over development and application of artificial intelligence. Wynn-Williams sums it up well with her thoughts about the travesty in Myanmar:

“I’ve spent a lot of time thinking about what unfolded next in Myanmar, and Facebook’s complicity. It wasn’t because of some grander vision or any malevolence toward Muslims in the country. Nor lack of money. My conclusion:  It was just that Joel, Elliot, Sheryl, and Mark didn’t give a fuck.”

Shedding Light: Briefs Filed in Kadrey v. Meta

kadrey

The purpose of cultivating works of authorship is to shed light on human experience, and the foundational purpose of the fair use doctrine in copyright law is to shed light on works of authorship. From its 18th century, English roots to the U.S. Supreme Court’s 2023 decision in AWF v. Goldsmith, the primary rationale for fair use is to permit the unlicensed use of works in ways that critique or comment upon the works themselves. Harvesting millions books to train an LLM does not do this.

With the growth of digital technologies and copyright protection for highly utilitarian computer code, fair use doctrine expands somewhat to permit certain “non expressive” uses of works. But these uses allowed by the courts have still tended to provide information about the works used or have been held to advance purposes like software interoperability. Harvesting millions of books to train an LLM does not do this.

A pair of briefs filed in Kadrey v. Meta—one by Association of American Publishers (AAP), the other filed by a group of IP law professors—present compelling arguments against finding that Meta’s unlicensed copying of millions of books to train its generative AI product Llama is fair use. A common theme in both briefs exposes a core fallacy, and legal hypocrisy, common to AI developers in these cases—namely that they copy protected “expression,” but they don’t copy protected “expression.”

As we see in the shorthand of social media, the developers write their own dichotomy by simultaneously humanizing and dehumanizing their products. In one breath, they compare machine leaning (ML) to human learning but then drop the analogy when they seek to claim that the protected “expression” in the works used is not copied or stored by their mysterious and complex “training” models. The AAP brief argues that copying “expression” is central to training an LLM, and the professors’ brief shows why “learning like a human” is precisely why fair use does not exempt Meta from obtaining licenses.

Both AAP and the professors naturally present specific arguments as to why none of the fair use case law supports Meta’s defense, but I was intrigued by the ways in which both briefs argue from different perspectives that training Llama indeed exploits the “expressive content” of the books appropriated. In fact, if it could be shown that no protected expression is copied or stored, this would be an argument that no case for infringement exists. But considering the emphasis on fair use—and all similar cases will almost certainly turn on fair use—we can assume that this statement from AAP is correct:

Meta would have this Court believe that authors’ original expression is not preserved in or exploited by the model. But this is not so. The LLM algorithmically maps and stores authors’ original expression so it can be used to generate output—indeed, that is the very point of the training exercise.

Kadrey and all AI training lawsuits with similar facts presented will turn on fair use factors one and four. Under factor two (nature of the works used), the books in Kadrey, and the works in most other cases, are “expressive” rather than “factual” in nature, and therefore, this factor favors plaintiffs. Under factor three (amount of the work used), it is understood that whole works have been fed into the LLM models, and so, this factor also favors plaintiffs.

Under the first fair use factor (purpose of the use), the court considers 1) whether the use is transformative; and 2) whether the use is commercial. Here, Meta’s commercial purpose is undeniable, and the AAP brief soundly argues that there is nothing transformative about copying the word-for-word expression in textual works for a purpose that sheds no light on the works used. On the contrary, the intent of the LLM is to create a non-human, substitute “author,” a purpose for which there is indeed no judicial precedent.

Factor four considers potential market harm to the copyright owner(s) of the work(s) used, and factor four may be the keystone in the broader creators versus GAI battle. Meta, a trillion-dollar company run by executives whose credibility is in doubt, contends that it is not feasible to license the books they used to train Llama. In response, AAP presents substantial evidence of licensing agreements between copyright owners and several major AI developers, and it states that Meta abandoned negotiations with publishers and chose instead to harvest books from pirate repositories.

Further, AAP argues “from a policy perspective” that Meta’s accessing those pirate “libraries” of DRM-free books militates against finding fair use in contravention of Congress’s intent when it passed the Digital Millennium Copyright Act (DMCA) in 1998. “Congress sought to establish a robust digital marketplace by ensuring appropriate safeguards for works made available online, including copyright owners’ ability to rely on DRM protections in distributing electronic copies of their works.”

In this spirit, inherent to the history of the fair use doctrine is the notion of “fair dealing” or, put differently, general legality in the overall purpose and character of the use. “The compiler of the training data’s knowledge of the unlawful provenance of the source copies might well taint the ‘character’ of the defendant’s use,” writes Professor Jane Ginsburg in a paper examining the question of fair use of works for AI training.[1]

The Professors’ Brief

The brief filed by the IP professors also emphasizes that the protected “expression” in the works is copied and exploited without license, but it also rather deftly uses Meta’s own rhetoric to doom the fair use defense. In general, when the AI cheerleaders say that LLMs “learn the way humans do,” my instinct has been to sneer at this anthropomorphic sentiment. But by giving the “learning like humans” analogy weight, the professors’ brief demonstrates exactly why that claim is fatal to a defense that the developer’s purpose is fair use.

Noting that humans indeed use protected works for “learning” all the time, the professors make plain that this exact relationship between author and reader (the basis for copyright) does not exempt the human from obtaining works legally. Thus, by Meta’s own analogy, the “machines learn like humans” claim is both an affirmation that the “expression” is being exploited and proof that that there is nothing transformative about using works for “learning.”

Further, the professors have a bit of fun emphasizing that Meta et al. strain to make the machine leaning process sound as technically complex as possible to obscure the fact that only by copying “expression” could the LLM actually “learn” anything. Here, a tip of the hat is deserved for the brief’s description of a human being reading a book thus:

… many billions of photons hit the book’s surface; some of those billions reached a lens, which focused them onto a retina, which converted them into electronic signals, which then resulted in electronic and chemical changes in some portion of over 100 billion neurons with over 100 trillion connections, some of those changes being transitory, and others more permanent.

The technical description of human processing and learning is even more mysterious because not even expert specialists in neuroscience know how the brain works at the neuronal level.

Well done! If that needlessly technical description of human reading requires legal access to the book, then so does the far less complex process of machine learning for AI development. Moreover, even if Meta were the vanguard developer and there were no examples of licensing deals being made, there is no rationale anywhere in commerce that a necessary resource must be free because it is essential. Meta et al. need electricity, engineers, and probably a computer or two to develop Lllama, and not one of these resources is free. Yet, somehow the most essential resource—the work of millions of authors—should be free.

On that note, there has never been a more important time to protect the rights and economic value of authors who shed light on the world we inhabit. I remain more than skeptical that it will ever be desirable to create literary works without authors, musical works without composers, etc. And certainly, licensing deals alone do not address all the potential hazards of unethical or questionable uses of generative AI. How products like Llama are used will provoke discussions that are cultural as well as legal. But for the moment, fair training of all AI models is the only rule that is both ethical and consistent with copyright’s purpose.


[1] Prof. Ginsburg is not one of the professors in the brief cited for this post.

Photo by Busko

Copyright and AI in a World of Whiplash Public Policy

copyright

I have not added a copyright post here since March 19, when the DC Circuit Court of Appeals affirmed in Thaler v. Perlmutter that works produced autonomously by generative AI (GAI) are not protected under U.S. copyright law. Although it is good to see the human authorship doctrine in copyright left undisturbed, it is a fleeting moment of sanity within a warped national reality.

As reported earlier, Open AI appealed to the administration’s focus on China as a basis to argue that “beating China” requires ignoring the copyright claims of authors whose works are used to train AI models. Not only is that claim wrong on it’s face, but the conduct of the current administration vis-à-vis civil rights forces millions of Americans to ask whether China is an adversary or a role model.

One mirror in the funhouse reveals a compelling bipartisan hearing held by the Senate Judiciary Committee, Subcommittee on Crime and Counterterrorism, where Chairman Hawley and colleagues from both parties offered strong endorsements for the courageous testimony of Facebook whistleblower Sarah Wynn-Williams. Focused primarily on Meta’s engagements with the Chinese Communist Party (CCP)—and Zuckerberg’s lying to Congress about that very issue—the committee cited other abuses described in Wynn-Williams’s book, like the company intentionally targeting vulnerable teens. (More about the book Careless People in another post.)

Ordinarily, I compartmentalize copyright matters from other criticisms of Big Tech, but here, the stories overlap, even if Meta is the only target of the committee’s investigation at this time. First, throughout her testimony, Wynn-Williams repeats the theme that Meta used the “but China will win” argument to oppose Congress taking any meaningful regulatory action. This alone should cast doubt upon Open AI et al. making the same argument as a rationale for mass copyright infringement for model training. As Senator Klobuchar noted, there was no basis for prior claims that enforcing various consumer safeguards (e.g., Kids Online Safety Act) would be counter-productive to national security, and in that light, Congress should decline to believe the same story in regard to copyright infringement.

Meta may be unique—or uniquely situated—as a clandestine partner to the CCP, but it is also notable that the committee mentioned the role of Meta’s Llama AI and heard Wynn-Williams’s testimony that the product was used by the CCP for “AI weapons” and for the development of the Chinese LLM DeepSeek. Further, Wynn-Wiliams offers a theory about the open source versus closed model AI competition in the marketplace. “There’s a lot of money on the line,” she says. “In some ways you could say, if you want open source to prevail, it helps to have a strong threat from a Chinese model so you can say that it’s really important that America wins, and we’re the American open-source option. And I think you can see the way that strategically plays out.”

“But China will win” is pretty much what Open AI told the Office of Science and Technology Policy in its letter arguing that machine training with copyrighted works is per se fair use. But looking at Meta (which is currently being sued in the Kadrey case), consider the perspective:  in developing Llama, not only did Meta scrape the literary works of millions of authors and journalists, and not only did it source pirate libraries for that purpose, but it also deployed that same AI power in the interests of a nation that brutally kills freedom of expression. Yes, of course, I’m thinking the same thing because it’s unavoidable. The current U.S. administration has engaged in multiple First Amendment and other constitutional violations, including assaults on the free press, and thus, the policy whiplash.

Couple these optics with the volume of evidence that the real power behind the destruction of the administrative state is a small group of tech billionaires pushing an anti-democracy ideology called the neo-reactionary movement (NRx), and the idea of advocating creators’ rights seems all but futile. After all, is it remotely sane to think that an administration of semi-literate, 1A-infringing, book banners will care about the rights of authors—let alone reject the tech-bros who wrote the destruction manual for the United States?

Setting aside the copyright questions raised by GAI training, Big Tech’s wanton harvest of artistic and intellectual works as lifeless raw material is perhaps the ultimate expression of the cyberlibertarian’s disdain for human beings as mere repositories of data to be exploited and manipulated. The rhetoric of Big Tech ideology—from 4Chan to the halls of academia—is the authoritarian principle that individuals must be sacrificed for the sake of the collective. All rights are a nuisance to the tech oligarch, and authors are the last people any authoritarian wants to empower.

Open AI’s claim that mass copyright infringement is necessary to “beat China” is paradoxical—either willfully or naively blind to the fact that when we treat works of authorship as mere fodder for the machine, we don’t beat the CCP; we emulate it. Further, not only is the claim overstated that GAI development is a matter of national security, but again, what does “national security” even mean at present? Concepts like American interests, values, innovation, global security, etc. are all diminished, if not wholly swallowed, by the reckless destruction of the principles and institutions that distinguish America as a leader among democratic nations. And copyright rights are in those same crosshairs.

In response to copyright’s critics, especially those in academia with Big Tech funding their work, I have argued that the diversity and scope of America’s creative output has been essential to its strength as a democracy. Whether one looks at the economic value of the core copyright industries, the cultural value of diverse creative expression, or both, the rationale for intellectual property is to incentivize useful innovation and legitimate greatness.

American authors—from historians to rockstars—are the legacy of an aspiration expressed by Noah Webster, the father of American English and of American copyright. In 1783, advocating the first state copyright law in Connecticut, Webster argued that “America must be as independent in literature as she is in politics—as famous for arts as for arms.” By contrast the “greatness” proclaimed by Trump is tautological and brittle just like Big Tech’s claims to “innovation” are often vague and misleading.

As proposed in my book, the inclusion of copyright in Article I was one of the more egalitarian and democratic choices made by the founders, even if they did not wholly grasp its potential. At the most basic level, copyright incentivizes creative expression by any citizen anywhere, and the American model largely fulfilled that traditional Republican principle that the market, not the government, decides what is successful.

The copyright questions presented in roughly 40 cases are difficult and novel. Moreover, the facts presented vary, and thus, the outcomes will vary, especially on questions of fair use. In the meantime, it is clear that at least some of the major AI developers are engaged in a campaign to appeal to the current administration to treat copyright rights much as it is treating other constitutional rights—as principles to trample in a march toward something very un-American.