A Strange Anecdote of DMCA Abuse?

I was told by a colleague who attended the Section 512 round tables in San Francisco that a consistent response from representatives of the OSPs was that anecdotes about harm to rights holders from piracy or YouTube-style infringement are not sufficient.  “We need data,” was apparently an oft-repeated imperative.  This is funny because that same crowd loves anecdotes about abuse of DMCA, and well they should because the anecdotes are likely to be more compelling than the data on that matter. But sometimes, the anecdotes are downright bizarre, as with this story reported yesterday in The Guardian by Alex Hern.  It is in fact the story of the DMCA abuse that wasn’t there.

At first reading, one assumes that this is a typical story about some non-copyright holding entity misusing the DMCA in order to attempt to censor criticism of its business.  In a nutshell, a UK citizen named Annabelle Narey had a bad experience with a UK building company called BuildTeam, and she consequently posted a negative review on a parents’ news and comment site called Mumsnet.  Her initial post prompted a thread of other users sharing their own bad experiences with the same company, which apparently prompted BuildTeam to try to have the negative reviews removed, even initimating possible defamation.  But then, Hern writes this:

“Mumsnet received a warning from Google: a takedown request had been made under the American Digital Millennium Copyright Act (DMCA), alleging that copyrighted material was posted without a licence on the thread.

As soon as the DMCA takedown request had been filed, Google de-listed the entire thread. All 126 posts are now not discoverable when a user searches Google for BuildTeam – or any other terms. The search company told Mumsnet it could make a counterclaim, if it was certain no infringement had taken place, but since the site couldn’t verify that its users weren’t actually posting copyrighted material, it would have opened it up to further legal pressure.”*

Initially, this description sounded odd to me for several reasons, not the least of which is that it would take about 30 mintues or less for Mumsnet to review 126 posts of this nature, which are usually quite short.  More than that, though, under the DMCA, a properly filed notice has to identify the Allegedly Infringed Work (AIW) and state under penalty of perjury that the filer is the owner, or agent of the owner, of that work. As such, what work was the filer alleging had been infringed in a thread of comments?  Because if the notice just said something as generic as “contains infrininging material,” then the notice should have been rejected by Google.  More confusing still, as Hern goes on to describe, the filer of the take down request wasn’t even BuildTeam.  Who it was is not quite clear.

Hern describes a strange sequence of events in which a guy named Douglas Bush plagiarized Narey’s original post, published it on a “spammy website,” and also pre-dated the post to a day three months prior to the day Narey had originally published it.  Then, it appears that the registered owner of said spammy website, a Mr. Ashraf of Pakistan, may have been the one to send the DMCA takedown notice pertaining to the original thread.  It sounds a bit like a ham-handed attempt at a copyright scam; but suffice to say, there is nothing legit about the take down request, and Google should not have processed it at all. Moreover, under these circumstances, Mumsnet should not have had any fear of restoring the material via counter notice, as Hern suggests they might.  He writes the following:

“Whoever sent the takedown request, Mumsnet was forced to make a choice: either leave the post up, and accept being delisted; fight the delisting and open themselves up to the same legal threats made against Google; or delete the post themselves, and ask the post to be relisted on the search engine.”  

What?? There is no such thing as “whoever sent the request.”  This DMCA filing clearly fails to meet statutory requirements, and the apparent sender is apparently in Pakistan! Mumsnet should have had no concern regarding litigation from anyone as a result of restoring this material. But then, Hern reports this:

Mumsnet deleted the post, and asked Google to reinstate the thread, but a month later, they received final word from the search firm: “‘Google has decided not to take action based on our policies concerning content removal and reinstatement’ which (it turned out) meant that they had delisted the entire thread”. 

Again, what in blazes is going on in this story?  Because it looks an awful lot like Google just plain messed up. Yet, for all its muddy details, Hern is presenting this tale as a prime example of how copyright becomes censorship on the internet, blaming the law itself for his own conclusion that “ … sites like YouTube, Twitter and Google … are forced to develop a hair-trigger over claims of copyright infringement, assuming guilt and asking the accused to prove their innocence.”

That’s a familiar refrain that rings hollow with legitimate rights holders who make proper use of DMCA.  Meanwhile, Google has often fought tooth, nail, and elbow against delisting search results, asserting past refusals to do so as a matter of principle. And that’s in cases involving clearly infringing links.  Why is the search giant, as Hern states, suddenly on a “hair trigger” to delist this little thread of consumer comments about a building service, where a copyright infringement is highly unlikely to exist?  And why should rights holders who have an interest in legitimate take down requests continue to have those interests denigrated by the general characterization that DMCA is so often used as a tool for censorship?

The potentially compelling part of this story is the matter of what Mr. Ashraf was actually intending. If he was the one to publish someone else’s post as his own and then use DMCA to attempt to assert an infringement claim against the original, what did he hope to achieve?  Is this a new kind of scam, general mischief, or a third-party exercise in censorship? It seems to me all the parties involved, including Google, should want an answer to this question, rather than settle on the familiar but misguided conclusion that copyright itself is the villain.


*It should be noted that Mumsnet does not use an internal search tool for its comment threads, but in fact uses Google Search. This would appear to be a factor in this story.

Reports of DMCA Abuse Likely Exaggerated

In the last week of March, you might have seen a headline or two announcing that 30% of DMCA takedown requests are questionable.  And since we don’t always read beyond headlines these days, these declarations happened to be conveniently-timed for the internet industry as the April 1 deadline approached for submitting public comments to the Copyright Office regarding potential revision to Section 512 of the DMCA.  This section of the law contains the provisions for rights holders to request takedowns of infringing uses of their works online; the provisions for restoring material due to error on the notice sender’s part; and the conditions by which online service providers (OSPs) may be shielded from liability for infringements committed by their users.

The eye-catching 30% number came from a new study entitled Notice and Takedown in Everyday Practice conducted by researchers at Berkeley and Columbia; and the handful of articles I saw provided little insight into the contents of the 160-page report, which I finally had a chance to review.  The authors, Jennifer M. Urban, Joe Karaganis, and Brianna L. Schofield, cite both qualitative and quantitative data from respondent rights holders and service providers; and the big story that their report produced—the one that will stick in people’s minds—is that rights holders and OSPs have increasingly adopted automated systems (bots) to process and analyze DMCA notices, which naturally leads to a higher error rate.  Thus the narrative that will be repeated is one in which major rights holders are using tools that cannot help but chill expression through error, especially when bots can’t do things like account for fair use.  But this isn’t exactly what the report tells us, and the authors themselves acknowledge that rights holders have only increased their use of automated notice sending in response to unabated growth in large-scale online infringement.

Having reviewed the report, my big-picture observations are as follows: a) it does not justify headlines suggesting that 30% of all DMCA takedown requests are “questionable”; and b) the report especially does not support the larger bias that the types of errors it identifies are tantamount to chilling expression online.  It also should be noted that the authors do acknowledge that the majority of DMCA notices, the supposed 70% which are not flawed, are predominantly filed on behalf of major entertainment industry corporations targeting the “most obvious infringing sites.”  This does not mean errors don’t exist among these notices, but people should not read the 30% number and jump to the typical conclusion that it’s all that damn MPAA’s fault. (In fact, the MPAA provided no data for this study.)  Instead, the report seems broadly to identify some predictable inconsistencies among third-party rights enforcement organizations (REOs), which file automated notices on behalf of rights holders of varying sizes.  While it is of course desirable for all parties that REOs achieve the greatest possible accuracy and maintain best practices, including human oversight, let’s look at some of the “questionable” notices identified by the quantitative section of the report.

The study surveyed just over 108 million takedown requests filed with the Lumen (formerly Chilling Effects) database, and the authors state that 99.8% of these notices were sent to Google Search, which automatically implies a data set different from the takedown scenario most critics tend to cite (e.g. a user-generated work appearing on a platform like YouTube). The quantitative section states that 15.4% of the request notices err because the Alleged Infringing Material (AIM), does not match the Alleged Infringed Work (AIW). In some cases, keyword searches matched material that shared like terms with the wrong works (e.g. House of Usher confused with the artist Usher), while a few other examples of mismatch are a little harder to fathom.

Regardless, while this type of flawed notice may represent inefficiency and waste for the rights holders, it does not get anywhere near the concerns users might have about stifling expression online.  This is because even the errors are exclusively targeting obvious infringement by criminal websites, and the report seems to bear this out.  Even if a percentage of notices contain these types of errors but are sent to links targeting sites that host 99% infringing material, each notice is still targeting an infringing link.  If an REO sends a takedown for Infringing File A when it ought to have sent one for Infringing File B, this may be an indication that the REO needs to improve its game, but it is not a mistake that affects anyone’s expression in any context whatsoever. It’s also not the kind of mistake that tells us much about DMCA beyond the fact that rights holders have to send out far too many notices against a constant blitz of infringements.  The outnumbered zombie-fighter may be less accurate with a shotgun, but if everything he hits is a zombie, no harm no foul.

So, assuming I’m reading the data correctly, that’s more than half the 30% of “questionable” notices accounted for, since the 30% is actually rounded up from 28.4%.  So, are mistakes being made? Of course. Are all, or even most, of these mistakes affecting anyone other than rather large rights holders and really large OSPs? It doesn’t look like it.  And let me pause in this regard to remind readers that when Congress passed the DMCA in 1998, it was their expectation that OSPs would cooperate with the major rights holders to develop Standard Technical Measures to address online infringement while protecting these platforms from liability.  The OSPs continue to enjoy that protection while rights holders are still waiting for the cooperation on the infringement thing.

As mentioned, one of the tempting bullet points to be highlighted by a few reporters after the Berkeley/Columbia study went public is that, of course, bots cannot adequately analyze fair use.  This is generally true and could theoretically pose a threat to expression online, but it’s hard to tell what we actually learn on this matter from the study.  The authors state that 7.3% of the notices reviewed were flagged as “questionable” due to “characteristics that weigh favorably toward fair use.”  This does not mean, however, that nearly 8 million notices were analyzed as possible fair uses. That would be impossible–and really boring–and the report clearly states that this was not done.  To arrive at a manageable data set the report states the following:

“Sampling from and coding a pool of 108 million takedown requests required building a custom database and “coding engine” that allowed us to enter and query inputs about any one takedown request. These tools allowed in-depth investigation of the notices and their component parts by combining available structured data from the form-based submissions with manual coding of characteristics of the sender, target, and claim. We also designed a customized randomization function that supports both sampling across the entire dataset and building randomized “tranches” of more targeted subsets while maintaining overall randomness.” 

The percentage of “questionable” notices is based on a random sampling of 1826 notices that were manually reviewed, and I leave it to experts in copyright law and/or statistical analysis to comment on the methodology. *[see note below]* With regard to fair use, the report states, “Flagged requests predominantly targeted such potential fair uses as mashups, remixes, or covers, and/or a link to a search results page that included mashups, remixes, and/or covers.” It also flagged ringtones and cases in which the “AIM used only a small portion of the AIW” or uses in which the AIM appeared to be made for “educational purposes.”’

Because no single factor is dispositive in a fair use analysis—and none of the criteria identified by the report is automatically a fair use—what the study presents is nearly 8 million notices that could be candidates for a proper fair use analysis but which might not provide so much as a single fair use defense that would hold up in court. If that seems unlikely, keep in mind that 8 million is a tiny number when we’re talking about the internet. It’s important to maintain perspective when these kinds of reports generate buzz that we’re seeing a trend toward “censorship” in a universe that comprises trillions of daily expressions, including millions of infringements that for various reasons do not even trigger a DMCA takedown request.  Are there fair uses taken down?  It would be absurd to expect otherwise.  But neither this report, nor any other prior study or testimony of which I am aware demonstrates that this problem is widespread.  And as I pointed out in detail in this post, the user of a work online has the final say (absent litigation) by means of the counter notice procedure in the DMCA.

The Berkeley/Columbia report notes a relatively low rate of counter notice filings, suggesting that users either don’t know they have a right to make fair uses of works or are afraid to assert that right via counter notice because the rights holder might be a big media company with big attorneys wielding big statutory penalties.  This assessment comes entirely from the qualitative section of the report, which comprises interviews with (mostly anonymized) respondent OSPs and rights holders.  The report does not include interviews with users and it does not appear to consider the possibility that the low rate of counter notices might correspond with the high rate of indefensible infringements.

The authors state, “In one OSP’s view, the prospect of sending users up against media company attorneys backed by statutory copyright penalties ‘eviscerated the whole idea of counter notice.’”  But including this statement from an unnamed OSP representative contradicts other anecdotal evidence published in the report, like this observation by the authors: “Several respondents said that the most consistent predictor of a low-quality notice was whether it came from a first-time, one-off, or low-volume sender.” In other words, the most likely senders of “questionable” notices seem to be parties other than the big media companies with their scary attorneys, including entities that have no business using DMCA at all because copyright infringement is not the issue.

Based on conversations I have had with pro-copyright experts, the report is fair in suggesting that the language in the DMCA, which contains words like “under penalty of perjury,” can frighten people away from using counter notices, particularly if a takedown request comes from even a mid-size business and the recipient is an individual. In these cases, it is reasonable to imagine the target of a notice might be apprehensive about asserting his/her right to use a counter notice without consulting legal counsel.  This is a valid point for consideration, and surely, well-intended individuals making creative or expressive uses of works should not be frightened into silence by virtue of their financial status.  But it is important to maintain perspective with regard to which segment of the market we’re looking at and what type of players are involved in a potential conflict.  In many cases cited by critics of DMCA takedown procedures, the purposely abusive notices tend to be anomalies, they often occur in foreign markets with weaker civil liberties than ours, or they are often remedied without litigation.

Meanwhile, individual rights holders of limited financial means face their own apprehensions and challenges in asserting their right to protect their works. As rights holders of all sizes have demonstrated repeatedly—and this report even addresses the problem—the ability for multiple, random users to file counter notices and restore clearly infringing material—and for OSPs to monetize those uses with impunity—puts rights holders at a tremendous disadvantage. It should also be recognized that none of these uses (e.g. a whole TV show or unlicensed song uploaded to YouTube) could rationally be defined as UGC (User Generated Content) when the uploaders have not generated anything at all. Hence, even the original intent of DMCA is not being fulfilled when the safe harbor shield continues to sustain these types of infringements.

It would take many more pages to fully delve into the details of the Berkeley/Columbia report, and the authors do fairly cite several challenges faced by rights holders in applying DMCA. Although the study is partly funded by Google, that alone does not disqualify its contents for me.  I cite reports funded by MPAA and other rights holding entities and think a study should stand or fall on its own merits. This one reveals some valuable insight; but it does not seem to adequately support those big headlines about DMCA abuse, which will surely be repeated in comment threads, blogs, and future articles.


*NOTE:  This has been altered from original publication based on comments (see below) from one of the report’s authors, Jennifer Urban. Originally, I stated that the team had used an algorithm to identify notices that may implicate fair use, and this was an error on my part.

Democracy Disrupted

A couple of posts ago, I reported that the organization Fight for the Future had facilitated enough comments sent to the Copyright Office regarding Section 512 of the DMCA that they “crashed” the servers.  In a follow-up email brimming with pride, the organization said this to those who contributed:

“Wow! In a matter of days you and nearly 100,000 other people told the U.S. Copyright Office about the urgent need for better Fair Use and free speech protections in the DMCA.”

I didn’t receive one of these emails, but my friend David Lowery did. And not because he said anything to the Copyright Office about the “urgent need for better Fair Use and free speech protections,” but because he and his colleagues tested the FFTF web form email blaster and published their findings on The Trichordist blog.  They found that the automated system did not verify email addresses or confirm that IP addresses were within the US; it also allowed multiple comments from the same source and as stated in the post, “we managed to post rapid-fire comments (less than three seconds between comments).”

As indicated in my other post, I really do believe you’d have to search long and hard to find 100,000 citizens who could properly explain the DMCA, let alone fair use doctrine; but to compound this nonsense, some astroturf organization floods a government server with automated messages that may represent anything from bots to foreign citizens to minors to the typically pavlovian American, who just clicks stuff that sounds really serious but that he doesn’t understand.  Democracy in action indeed.

I’ve made this point many times, but it’s one worth making often.  This type of automated “political action,” which in this case is funded by a very large industry, should be among the real digital-age phenomena that scares the hell out of people, regardless of the stated issue du jour.  Forget the DMCA for a moment and imagine it’s the pharmaceutical industry or petroleum or Koch Industries using the same exact tools to rally virtual citizens, sock puppets, non citizens, and literally anyone capable of believing a lie and clicking a mouse to flood the EPA or HHS on some matter that disfavors the public interest in the service of one industry’s bottom line. That’s not even coming close to the reason the first amendment affirms the rights of speech and the petition of government. And, yes, there is a history of obfuscation by big business since long before the internet, but automation seems uniquely suited to fostering the illusion that the people are the ones doing the speaking.

In The Trichordist post, Lowery indicates that if FFTF used the type of automation described above to flood government servers, it might have been illegal but was at least a well-funded monopolization of a system meant to allow all parties to comment on an issue. Hence the “crashing” that this organization is so proud of is tantamount to—you got it—chilling free speech.  One could of course say this about any online petition in theory, but isn’t it interesting that the last time we heard about crashing systems like this was over SOPA?  So, does this really happen because there are so many well-informed citizens who care more about “digital rights” than any number of more pressing issues? Or might it have something to do with the fact that the corporate interests in these cases also happen to be the world’s experts in automation and aggregation?  Maybe not.  Maybe there really are more Americans worried about whether or not some YouTube video is a “fair use” than are concerned with the economy, violent crime, security, real civil rights violations, etc.  And if that’s the case, then  there’s truly nothing left of the Republic worth fighting for, is there?

On the positive side, I suspect a lot of this digital reactivism is wasted and that the internet industry may eventually discover that not everything is a numbers game.  For all the megabytes of outrage, what exactly does anyone think the Copyright Office is supposed to do with most of it? Responsibly vetted petitions have an important role to play in public policy.  But in a moment like this, it is the Register of Copyright’s job to consider the views of various stakeholders; and the comments that should be most influential will come from representatives of all sides who submit fairly long and well-reasoned statements based on actual knowledge of the law.

Ultimately, the Copyright Office recommendations to Congress on Section 512 may be 100 pages worth of analysis based on legal precedent going back to the beginning of the country. So, any petition to this particular office only carries so much weight in the first place; but how much attention does Fight for the Future imagine copyright experts will give to some boilerplate whinging about a doctrine they have grossly misrepresented to the signers of said petition?  And even 100,000 verified signatures would be small potatoes in a age when people will click on just about anything.  It probably wouldn’t be that hard to automate 100,000 “signatures” to lobby the White House to appoint Sponge Bob Square Pants as Ambassador to Fiji, but so what?  (Come to think of it, that petition would probably do quite well.)

There are an estimated 5.5 million jobs in the U.S. that directly depend upon the protection of copyrights. Meanwhile, every independent rights holder I have thus far encountered has effectively given up on the DMCA as a tool for protecting creative works online.  That’s a tangible problem, and one that does affect everyone because 5.5 million jobs supports a hell of lot more jobs than that in the overall market.  We could take this reality seriously, or we could keep finding ways to imagine that free speech is under siege and continue to allow the largest companies in the world to manipulate the political process with a little code and a lot of noise.