Google Books & The Semantic Maze of Fair Use

Photo by author.

This week the Supreme Court declined to consider the Authors Guild v Google case, which lets stand the Second Circuit Court ruling that Google’s use of scanned published works for its search tool Google Books constitutes a fair use.  Various pundits and advocates have hailed this as a victory for the fair use principle.  In fact, I saw a headline the other day on Facebook that began with the words “Fair Use Wins …”, and although the decision is unquestionably a win for Google, the fair use principle actually remains mired in a semantic confusion about which the high court might have at least provided some clarity.  It’s all about the word transformativeness.

The fair use doctrine was added to the Copyright Law as part of the 1976 Act, and its original intent was to protect various types of expressions—commentary, parody, education, artistic remixes, reportage, etc.—that by necessity made limited and conditional uses of copyrighted works.  I’ve written longer posts about fair use doctrine in general, and won’t repeat all that here, but readers will remember that there are four interrelated factors to be considered* in assessing whether a use constitutes a fair use.  But in 1994, in a landmark case that was heard by the Supreme Court called Campbell v Acuff-Rose Music, the fair use doctrine grew a new appendage called “transformativeness” that has, in the age of the internet, not only become something of a fifth factor that seems to override consideration of the other four, but also has not been clearly defined as a term of art in legal practice.

As I continue to learn from my attorney friends, some of the words we use in everyday language become terms of art in the legal world, which generally means that court rulings have shaped, narrowed, or expanded the dictionary definition of key terms.  For instance, based on the current ruling by a federal court, the word articles can only mean “physical objects” with regard to the International Trade Commission’s authority to prohibit the importation of illegal goods.  So, if Congress wants to grant that body the authority to restrict the importation of digital data for illegal purposes, they’re probably going to have to rewrite the law.  (More about that another time, perhaps.)

The concept of “transformativeness” in fair use parlance was introduced by Judge Pierre Leval in his paper “Toward a Fair Use Standard” published in the Harvard Law Review in 1990, and coincidentally it was Leval who wrote the decision in the Second Circuit’s ruling in Authors Guild v Google.  But even though the “father of transformativeness” himself has ruled in this case, there is still much confusion about the term and what it means when considering fair use. As Thomas Sydnor of the Center for Internet, Communications and Technology Policy at the American Enterprise Institute writes about the situation:

“As cases applying this judge-made “transformativeness”-based approach to fair use accumulate, that term becomes increasingly incoherent, inconsistent, and counterintuitive. Collectively, its incoherence(s) now threaten to turn what was once a productively flexible multi-factor balancing test into little more than a perfunctory recitation of factors ending in judicial ipsa dixit – “because I said so.” Under such circumstances, rule of law cannot persist.”

Sydnor further points out that the word transform already exists in the 1976 Copyright Act in reference to the preparation of “derivative works,” which is another term of art to describe works such as spin-offs or adaptations into other media. These rights belong exclusively to the copyright owner of the original work and should not be confused with the more casual way we might use the word derivative to describe, or even criticize, a work that is mimicking some other work.  For instance, the above-mentioned Campbell case involves a work of parody that we might describe in common language as derivative, but not so in the context of copyright law.

Campbell v Acuff-Rose Music involved a new, expressive work, specifically 2 Live Crew’s raunchy parody of the song “Oh, Pretty Woman” co-written and originally performed by Roy Orbison.  The court held in Campbell that “the more transformative the new work, the less will be the significance of other factors.”  In this case, the court is referring to the extent to which 2 Live Crew “transformed” the original song to make a new song.  By contrast, though, Google does not “transform” any of the original works to create new expressions but instead uses the contents of the works to create a new search service called Google Books.

So, with these two rulings, we are looking at two significantly distinct definitions of the word transformativeness.  The first refers to modification of an expressive work in order to make a new expressive work.  The second implicitly refers to transformation of the external world (society) by the introduction of some new capacity (i.e. function) it did not have before.  This is particularly relevant because the language used by SCOTUS, asserting that “transformativeness” should “lessen the significance of the other factors,” can only rationally be applied—if the spirit of fair use doctrine is to be kept intact—to the first definition in which an original work is “transformed” to create a new, expressive work.  In the second usage of the word, in which the external world is assumed to be transformed by some new functional use, then “transformativeness” becomes too heavily weighted against the other factors, thus giving (for instance) a giant, wealthy service provider extraordinary latitude to define just about anything it does as socially “transformative.”

If the courts are going to apply this second definition of “transformativeness,” then it seems the consideration ought not to carry any more weight than the other factors because the second definition provides a basis for large-scale, corporate-funded uses of millions of works in a way that the first definition does not.  In other words Google Books may be deemed a fair use in the end, but it is not sensible that the application of “transformativeness” in Campbell be applied.  As it stands, the courts appear to be giving the same weight to “transformativeness” while using two very different definitions of the word.

Semantically speaking, I would argue that transformative is not exactly the right word to use when one specifically wants to describe some measure of modification to an existing thing like a creative expression.  The term is problematic because it begs exactly the confusion we now have in the courts—because transformative more properly describes the effects of an invention or expression to the external world (e.g. electricity was transformative in that it made modern society). While it would not be wrong in common parlance to describe, for instance, Jeff Buckley’s rendition of Leonard Cohen’s “Hallelujah” as “transformative,” even this usage would generally tend to convey that both song and listener are in some way transformed.  But in law, this is too vague.  This is why the attorneys refer to a term of art –a definition that is established within the language of the law that may or may not conform to everyday usage.  Sydnor points out that Leval himself provides little guidance in this regard when he quotes the judge thus:

“The word “transformative” cannot be taken too literally as a sufficient key to understanding the elements of fair use. It is rather a suggestive symbol for a complex thought….”

 “[T]he word “transformative,” if interpreted too broadly, can also seem to authorize copying that should fall within the scope of an author’s derivative rights. Attempts to find a circumspect shorthand for a complex concept are best understood as suggestive of a general direction, rather than as definitive descriptions.”

Right. I’m no legal scholar, but I think the concept “transformative” is a troublemaker.

Because the precedent SCOTUS ruling in Campbell is based on the use of “transformativeness” to describe the modification of an expressive work, it would make sense to settle upon this definition and to seek another term for considering functional uses akin to Google Books. As CEO of Copyright Alliance Keith Kupferschmid writes in a post on the organization’s website:

“The fair use doctrine is an equitable doctrine, but in functional use cases it hasnt worked that way because the transformative use test is ill equipped to effectively balance the competing interests at stake in these cases.  Fair use analysis should take into account not only the interests of owners and users but also the underlying policy objectives of the copyright law.  To account for these factors in a reasonable and balanced way, it is time for the courts to begin using a functional use test.”

Unfortunately for rights holders, the confusion about “transformativeness” that leaks into general consciousness results in a casual logic, which assumes that simply changing the context of a work, like placing a photograph on one’s Facebook page, is “transformative” enough to make a use fair.  Google Books is a misstep in that direction, and if this becomes the application of fair use, then that’s the ballgame.  There are no copyrights left. I can take your songs or images, put them on this blog, call it “transformative”, and get away with it.  That may be an attractive proposal to the internet industry, but it is far from the original intent of fair use doctrine in the copyright law, which was to protect expression, and it would have disastrous effects on the professional creative industry as we know it.


*Changed from original publication, which stated that the factors are considered by a three-judge panel.  As pointed out by Anonymous commenter, this is only true in an appellate court. A mistake I made in haste owing to the fact that many famous fair use cases are famous because they’ve gone to higher courts.

Reports of DMCA Abuse Likely Exaggerated

In the last week of March, you might have seen a headline or two announcing that 30% of DMCA takedown requests are questionable.  And since we don’t always read beyond headlines these days, these declarations happened to be conveniently-timed for the internet industry as the April 1 deadline approached for submitting public comments to the Copyright Office regarding potential revision to Section 512 of the DMCA.  This section of the law contains the provisions for rights holders to request takedowns of infringing uses of their works online; the provisions for restoring material due to error on the notice sender’s part; and the conditions by which online service providers (OSPs) may be shielded from liability for infringements committed by their users.

The eye-catching 30% number came from a new study entitled Notice and Takedown in Everyday Practice conducted by researchers at Berkeley and Columbia; and the handful of articles I saw provided little insight into the contents of the 160-page report, which I finally had a chance to review.  The authors, Jennifer M. Urban, Joe Karaganis, and Brianna L. Schofield, cite both qualitative and quantitative data from respondent rights holders and service providers; and the big story that their report produced—the one that will stick in people’s minds—is that rights holders and OSPs have increasingly adopted automated systems (bots) to process and analyze DMCA notices, which naturally leads to a higher error rate.  Thus the narrative that will be repeated is one in which major rights holders are using tools that cannot help but chill expression through error, especially when bots can’t do things like account for fair use.  But this isn’t exactly what the report tells us, and the authors themselves acknowledge that rights holders have only increased their use of automated notice sending in response to unabated growth in large-scale online infringement.

Having reviewed the report, my big-picture observations are as follows: a) it does not justify headlines suggesting that 30% of all DMCA takedown requests are “questionable”; and b) the report especially does not support the larger bias that the types of errors it identifies are tantamount to chilling expression online.  It also should be noted that the authors do acknowledge that the majority of DMCA notices, the supposed 70% which are not flawed, are predominantly filed on behalf of major entertainment industry corporations targeting the “most obvious infringing sites.”  This does not mean errors don’t exist among these notices, but people should not read the 30% number and jump to the typical conclusion that it’s all that damn MPAA’s fault. (In fact, the MPAA provided no data for this study.)  Instead, the report seems broadly to identify some predictable inconsistencies among third-party rights enforcement organizations (REOs), which file automated notices on behalf of rights holders of varying sizes.  While it is of course desirable for all parties that REOs achieve the greatest possible accuracy and maintain best practices, including human oversight, let’s look at some of the “questionable” notices identified by the quantitative section of the report.

The study surveyed just over 108 million takedown requests filed with the Lumen (formerly Chilling Effects) database, and the authors state that 99.8% of these notices were sent to Google Search, which automatically implies a data set different from the takedown scenario most critics tend to cite (e.g. a user-generated work appearing on a platform like YouTube). The quantitative section states that 15.4% of the request notices err because the Alleged Infringing Material (AIM), does not match the Alleged Infringed Work (AIW). In some cases, keyword searches matched material that shared like terms with the wrong works (e.g. House of Usher confused with the artist Usher), while a few other examples of mismatch are a little harder to fathom.

Regardless, while this type of flawed notice may represent inefficiency and waste for the rights holders, it does not get anywhere near the concerns users might have about stifling expression online.  This is because even the errors are exclusively targeting obvious infringement by criminal websites, and the report seems to bear this out.  Even if a percentage of notices contain these types of errors but are sent to links targeting sites that host 99% infringing material, each notice is still targeting an infringing link.  If an REO sends a takedown for Infringing File A when it ought to have sent one for Infringing File B, this may be an indication that the REO needs to improve its game, but it is not a mistake that affects anyone’s expression in any context whatsoever. It’s also not the kind of mistake that tells us much about DMCA beyond the fact that rights holders have to send out far too many notices against a constant blitz of infringements.  The outnumbered zombie-fighter may be less accurate with a shotgun, but if everything he hits is a zombie, no harm no foul.

So, assuming I’m reading the data correctly, that’s more than half the 30% of “questionable” notices accounted for, since the 30% is actually rounded up from 28.4%.  So, are mistakes being made? Of course. Are all, or even most, of these mistakes affecting anyone other than rather large rights holders and really large OSPs? It doesn’t look like it.  And let me pause in this regard to remind readers that when Congress passed the DMCA in 1998, it was their expectation that OSPs would cooperate with the major rights holders to develop Standard Technical Measures to address online infringement while protecting these platforms from liability.  The OSPs continue to enjoy that protection while rights holders are still waiting for the cooperation on the infringement thing.

As mentioned, one of the tempting bullet points to be highlighted by a few reporters after the Berkeley/Columbia study went public is that, of course, bots cannot adequately analyze fair use.  This is generally true and could theoretically pose a threat to expression online, but it’s hard to tell what we actually learn on this matter from the study.  The authors state that 7.3% of the notices reviewed were flagged as “questionable” due to “characteristics that weigh favorably toward fair use.”  This does not mean, however, that nearly 8 million notices were analyzed as possible fair uses. That would be impossible–and really boring–and the report clearly states that this was not done.  To arrive at a manageable data set the report states the following:

“Sampling from and coding a pool of 108 million takedown requests required building a custom database and “coding engine” that allowed us to enter and query inputs about any one takedown request. These tools allowed in-depth investigation of the notices and their component parts by combining available structured data from the form-based submissions with manual coding of characteristics of the sender, target, and claim. We also designed a customized randomization function that supports both sampling across the entire dataset and building randomized “tranches” of more targeted subsets while maintaining overall randomness.” 

The percentage of “questionable” notices is based on a random sampling of 1826 notices that were manually reviewed, and I leave it to experts in copyright law and/or statistical analysis to comment on the methodology. *[see note below]* With regard to fair use, the report states, “Flagged requests predominantly targeted such potential fair uses as mashups, remixes, or covers, and/or a link to a search results page that included mashups, remixes, and/or covers.” It also flagged ringtones and cases in which the “AIM used only a small portion of the AIW” or uses in which the AIM appeared to be made for “educational purposes.”’

Because no single factor is dispositive in a fair use analysis—and none of the criteria identified by the report is automatically a fair use—what the study presents is nearly 8 million notices that could be candidates for a proper fair use analysis but which might not provide so much as a single fair use defense that would hold up in court. If that seems unlikely, keep in mind that 8 million is a tiny number when we’re talking about the internet. It’s important to maintain perspective when these kinds of reports generate buzz that we’re seeing a trend toward “censorship” in a universe that comprises trillions of daily expressions, including millions of infringements that for various reasons do not even trigger a DMCA takedown request.  Are there fair uses taken down?  It would be absurd to expect otherwise.  But neither this report, nor any other prior study or testimony of which I am aware demonstrates that this problem is widespread.  And as I pointed out in detail in this post, the user of a work online has the final say (absent litigation) by means of the counter notice procedure in the DMCA.

The Berkeley/Columbia report notes a relatively low rate of counter notice filings, suggesting that users either don’t know they have a right to make fair uses of works or are afraid to assert that right via counter notice because the rights holder might be a big media company with big attorneys wielding big statutory penalties.  This assessment comes entirely from the qualitative section of the report, which comprises interviews with (mostly anonymized) respondent OSPs and rights holders.  The report does not include interviews with users and it does not appear to consider the possibility that the low rate of counter notices might correspond with the high rate of indefensible infringements.

The authors state, “In one OSP’s view, the prospect of sending users up against media company attorneys backed by statutory copyright penalties ‘eviscerated the whole idea of counter notice.’”  But including this statement from an unnamed OSP representative contradicts other anecdotal evidence published in the report, like this observation by the authors: “Several respondents said that the most consistent predictor of a low-quality notice was whether it came from a first-time, one-off, or low-volume sender.” In other words, the most likely senders of “questionable” notices seem to be parties other than the big media companies with their scary attorneys, including entities that have no business using DMCA at all because copyright infringement is not the issue.

Based on conversations I have had with pro-copyright experts, the report is fair in suggesting that the language in the DMCA, which contains words like “under penalty of perjury,” can frighten people away from using counter notices, particularly if a takedown request comes from even a mid-size business and the recipient is an individual. In these cases, it is reasonable to imagine the target of a notice might be apprehensive about asserting his/her right to use a counter notice without consulting legal counsel.  This is a valid point for consideration, and surely, well-intended individuals making creative or expressive uses of works should not be frightened into silence by virtue of their financial status.  But it is important to maintain perspective with regard to which segment of the market we’re looking at and what type of players are involved in a potential conflict.  In many cases cited by critics of DMCA takedown procedures, the purposely abusive notices tend to be anomalies, they often occur in foreign markets with weaker civil liberties than ours, or they are often remedied without litigation.

Meanwhile, individual rights holders of limited financial means face their own apprehensions and challenges in asserting their right to protect their works. As rights holders of all sizes have demonstrated repeatedly—and this report even addresses the problem—the ability for multiple, random users to file counter notices and restore clearly infringing material—and for OSPs to monetize those uses with impunity—puts rights holders at a tremendous disadvantage. It should also be recognized that none of these uses (e.g. a whole TV show or unlicensed song uploaded to YouTube) could rationally be defined as UGC (User Generated Content) when the uploaders have not generated anything at all. Hence, even the original intent of DMCA is not being fulfilled when the safe harbor shield continues to sustain these types of infringements.

It would take many more pages to fully delve into the details of the Berkeley/Columbia report, and the authors do fairly cite several challenges faced by rights holders in applying DMCA. Although the study is partly funded by Google, that alone does not disqualify its contents for me.  I cite reports funded by MPAA and other rights holding entities and think a study should stand or fall on its own merits. This one reveals some valuable insight; but it does not seem to adequately support those big headlines about DMCA abuse, which will surely be repeated in comment threads, blogs, and future articles.


*NOTE:  This has been altered from original publication based on comments (see below) from one of the report’s authors, Jennifer Urban. Originally, I stated that the team had used an algorithm to identify notices that may implicate fair use, and this was an error on my part.

Box Office Revenues Say Little About Piracy

Once again the MPAA has announced a profitable year for American motion pictures, and once again some of the usual suspects have seized upon this announcement to declare the studios hypocrites for ever saying that piracy causes real harm to the industry. Certainly, it’s easy enough to keep writing this same, careless article all the time. Cory Doctorow cobbled together a 100-word jab for BoingBoing; TorrentFreak reported essentially the same premise with a little less snark; and Ruth Reader managed to tap out this little sneer on Mic.com, complete with obligatory reference to SOPA, under the unforgiveably misleading headline The Movie Industry Just Admitted Piracy Isn’t Curbing Its Massive Profits.  

I know this may be hard to imagine, but the question of piracy’s harm to the filmed-entertainment industry overall is considerably more complex than a measurement of how the top-grossing motion pictures are doing at the box office.  But before expanding on this subject (again), let me repeat the following theme as a matter of principle:  Whether piracy siphons $100 or $100 million out of the legitimate market, it’s money that belongs to the people who do the work. Sadly, this is not a sufficient rationale for many, so we have this silly conversation instead, speculating about how innocuous piracy is or isn’t.

The annual report released by the Motion Picture Association reveals worldwide box-office sales of $38.3 billion, up 5% from 2014.  And that’s good news.  But the only thing we can actually  conclude from the information in this report is that audiences around the world—and especially in Asia-Pacific—are going to theaters in numbers large enough to make the big movies profitable regardless of piracy. This isn’t all that revelatory, of course—unless you actually thought nobody would go to the theater to see the new Star Wars—but to the the above-named pundits and their ilk, these revenues appear to make the studios out to be Chicken Littles.  How can they be so aggressive about piracy when they’re clearly doing just fine?  But if anyone took the time to look at the report and to learn something about the whole industry, they could not justifiably jump to the conclusion that piracy is fundamentally harmless.

Ruth Reader notes that MPAA CEO Chris Dodd, in an address to CinemaCon this week, stated that the industry projects a $1.5bn estimated annual loss at the box office due to piracy.  This number may seem negligible next to $38 billion, but it’s worth noting that this estimate applies only to US box office, which makes the number considerably more significant relative to the $11.1 billion in sales for the US and Canada.

But assuming the $1.5 billion is accurate and still seems trifling to some readers, let’s look at it from a slightly different perspective that considers all of the 708 films included in the report.  Of these, 561 films were non-MPAA member, independent features.  And let’s imagine that 10% of that $1.5 billion could have been divided among the best 100 of those indies. That would be $1.5 million per movie, which any independent filmmaker will tell you can be life-and-death money.  In fact, Adam Leipzig of CreativeFuture used exactly that expression in this article when he noted the conservatively estimated $1.83 million the film Boyhood lost to piracy last year.  Of course, we cannot definitively say where money not spent might have gone, but by the same logic, it doesn’t make sense to blithely assume that because Jurassic World and Inside Out did great, piracy isn’t an issue across the broader market.

The fact is we can’t know exactly how much is lost due to piracy, but we can conservatively project that a relevant portion of the illegal market would be recaptured if piracy did not exist. Out of a universe of hundreds of millions of pirate site visits every month, if just 20 million consumers worldwide were to switch from illegal home-viewing channels to legal ones and spend just $13/month on filmed entertainment, that would add up to about $8 billion per year. And to put that in perspective, the top 25 grossing films of 2015 earned about $6 billion at the box office.  Or spread $8 billion across 500 idependent titles, and it would be $16m in sales per title.  I’m not suggesting revenue spreads evenly like that; of course it does not. But that’s the point. The top-grossing products may consistently earn enough to overwhelm the effects of piracy, but the smaller products—indie features, TV programs, documentaries—which operate on smaller margins are naturally going to be affected more acutely by any loss.  In fact, producer Martha De Laurentiis recently made a pretty good case for saying that piracy may have played a role in cancelling the popular series Hannibal.

Still, I realize that the pundits’ main premise, however unexamined it may be, is that the studios are the big whiners who want to fight piracy, and the studios are the ones who seem to be doing well.  But even if that logic were sound, readers should not be fooled into thinking it’s exclusively the studio execs who have a problem with piracy.  They’re just the ones who make the headlines, the ones who have the resources to try to address piracy, and the ones who are the most frequently vilified in this context. The indie filmmaker who loses money to piracy feels quite strongly about the issue, too; she just doesn’t have the muscle to do much about it.  As such, the indie filmmaker’s best hope for mitigating large-scale piracy is the costly effort being made by the studios. This is one of many reasons why “fans” cannot presume to separate the individual filmmakers from the major companies; they are co-dependent in a variety of ways.

Finally, while the temptation to bash the studios on the piracy issue will remain SOP for the lazy reporter, at least the peanut gallery might consider its own hypocrisy when criticizing these companies for producing exactly the films that consistently top the Most Pirated lists year after year.  Of the few words Cory Doctorow could be bothered to share with us on this subject, he spent some of these accusing the studios of clinging to “high-risk tentpole economics”.  In other words, the studios’ making money with tentpole films is grounds for calling them hypocrites about piracy, but then the studios should also be lambasted for making tentpole films, which is partly a response to piracy.  I know I’ve raised this issue before, but a threat of any loss in value to any commodity will drive investors to safety.  So, if you promote piracy and at the same time blame investors for producing the kind of big-spectacle fare that can earn revenue in spite of piracy, you kinda sound like you don’t know what you’re talking about.