In the last week of March, you might have seen a headline or two announcing that 30% of DMCA takedown requests are questionable. And since we don’t always read beyond headlines these days, these declarations happened to be conveniently-timed for the internet industry as the April 1 deadline approached for submitting public comments to the Copyright Office regarding potential revision to Section 512 of the DMCA. This section of the law contains the provisions for rights holders to request takedowns of infringing uses of their works online; the provisions for restoring material due to error on the notice sender’s part; and the conditions by which online service providers (OSPs) may be shielded from liability for infringements committed by their users.
The eye-catching 30% number came from a new study entitled Notice and Takedown in Everyday Practice conducted by researchers at Berkeley and Columbia; and the handful of articles I saw provided little insight into the contents of the 160-page report, which I finally had a chance to review. The authors, Jennifer M. Urban, Joe Karaganis, and Brianna L. Schofield, cite both qualitative and quantitative data from respondent rights holders and service providers; and the big story that their report produced—the one that will stick in people’s minds—is that rights holders and OSPs have increasingly adopted automated systems (bots) to process and analyze DMCA notices, which naturally leads to a higher error rate. Thus the narrative that will be repeated is one in which major rights holders are using tools that cannot help but chill expression through error, especially when bots can’t do things like account for fair use. But this isn’t exactly what the report tells us, and the authors themselves acknowledge that rights holders have only increased their use of automated notice sending in response to unabated growth in large-scale online infringement.
Having reviewed the report, my big-picture observations are as follows: a) it does not justify headlines suggesting that 30% of all DMCA takedown requests are “questionable”; and b) the report especially does not support the larger bias that the types of errors it identifies are tantamount to chilling expression online. It also should be noted that the authors do acknowledge that the majority of DMCA notices, the supposed 70% which are not flawed, are predominantly filed on behalf of major entertainment industry corporations targeting the “most obvious infringing sites.” This does not mean errors don’t exist among these notices, but people should not read the 30% number and jump to the typical conclusion that it’s all that damn MPAA’s fault. (In fact, the MPAA provided no data for this study.) Instead, the report seems broadly to identify some predictable inconsistencies among third-party rights enforcement organizations (REOs), which file automated notices on behalf of rights holders of varying sizes. While it is of course desirable for all parties that REOs achieve the greatest possible accuracy and maintain best practices, including human oversight, let’s look at some of the “questionable” notices identified by the quantitative section of the report.
The study surveyed just over 108 million takedown requests filed with the Lumen (formerly Chilling Effects) database, and the authors state that 99.8% of these notices were sent to Google Search, which automatically implies a data set different from the takedown scenario most critics tend to cite (e.g. a user-generated work appearing on a platform like YouTube). The quantitative section states that 15.4% of the request notices err because the Alleged Infringing Material (AIM), does not match the Alleged Infringed Work (AIW). In some cases, keyword searches matched material that shared like terms with the wrong works (e.g. House of Usher confused with the artist Usher), while a few other examples of mismatch are a little harder to fathom.
Regardless, while this type of flawed notice may represent inefficiency and waste for the rights holders, it does not get anywhere near the concerns users might have about stifling expression online. This is because even the errors are exclusively targeting obvious infringement by criminal websites, and the report seems to bear this out. Even if a percentage of notices contain these types of errors but are sent to links targeting sites that host 99% infringing material, each notice is still targeting an infringing link. If an REO sends a takedown for Infringing File A when it ought to have sent one for Infringing File B, this may be an indication that the REO needs to improve its game, but it is not a mistake that affects anyone’s expression in any context whatsoever. It’s also not the kind of mistake that tells us much about DMCA beyond the fact that rights holders have to send out far too many notices against a constant blitz of infringements. The outnumbered zombie-fighter may be less accurate with a shotgun, but if everything he hits is a zombie, no harm no foul.
So, assuming I’m reading the data correctly, that’s more than half the 30% of “questionable” notices accounted for, since the 30% is actually rounded up from 28.4%. So, are mistakes being made? Of course. Are all, or even most, of these mistakes affecting anyone other than rather large rights holders and really large OSPs? It doesn’t look like it. And let me pause in this regard to remind readers that when Congress passed the DMCA in 1998, it was their expectation that OSPs would cooperate with the major rights holders to develop Standard Technical Measures to address online infringement while protecting these platforms from liability. The OSPs continue to enjoy that protection while rights holders are still waiting for the cooperation on the infringement thing.
As mentioned, one of the tempting bullet points to be highlighted by a few reporters after the Berkeley/Columbia study went public is that, of course, bots cannot adequately analyze fair use. This is generally true and could theoretically pose a threat to expression online, but it’s hard to tell what we actually learn on this matter from the study. The authors state that 7.3% of the notices reviewed were flagged as “questionable” due to “characteristics that weigh favorably toward fair use.” This does not mean, however, that nearly 8 million notices were analyzed as possible fair uses. That would be impossible–and really boring–and the report clearly states that this was not done. To arrive at a manageable data set the report states the following:
“Sampling from and coding a pool of 108 million takedown requests required building a custom database and “coding engine” that allowed us to enter and query inputs about any one takedown request. These tools allowed in-depth investigation of the notices and their component parts by combining available structured data from the form-based submissions with manual coding of characteristics of the sender, target, and claim. We also designed a customized randomization function that supports both sampling across the entire dataset and building randomized “tranches” of more targeted subsets while maintaining overall randomness.”
The percentage of “questionable” notices is based on a random sampling of 1826 notices that were manually reviewed, and I leave it to experts in copyright law and/or statistical analysis to comment on the methodology. *[see note below]* With regard to fair use, the report states, “Flagged requests predominantly targeted such potential fair uses as mashups, remixes, or covers, and/or a link to a search results page that included mashups, remixes, and/or covers.” It also flagged ringtones and cases in which the “AIM used only a small portion of the AIW” or uses in which the AIM appeared to be made for “educational purposes.”’
Because no single factor is dispositive in a fair use analysis—and none of the criteria identified by the report is automatically a fair use—what the study presents is nearly 8 million notices that could be candidates for a proper fair use analysis but which might not provide so much as a single fair use defense that would hold up in court. If that seems unlikely, keep in mind that 8 million is a tiny number when we’re talking about the internet. It’s important to maintain perspective when these kinds of reports generate buzz that we’re seeing a trend toward “censorship” in a universe that comprises trillions of daily expressions, including millions of infringements that for various reasons do not even trigger a DMCA takedown request. Are there fair uses taken down? It would be absurd to expect otherwise. But neither this report, nor any other prior study or testimony of which I am aware demonstrates that this problem is widespread. And as I pointed out in detail in this post, the user of a work online has the final say (absent litigation) by means of the counter notice procedure in the DMCA.
The Berkeley/Columbia report notes a relatively low rate of counter notice filings, suggesting that users either don’t know they have a right to make fair uses of works or are afraid to assert that right via counter notice because the rights holder might be a big media company with big attorneys wielding big statutory penalties. This assessment comes entirely from the qualitative section of the report, which comprises interviews with (mostly anonymized) respondent OSPs and rights holders. The report does not include interviews with users and it does not appear to consider the possibility that the low rate of counter notices might correspond with the high rate of indefensible infringements.
The authors state, “In one OSP’s view, the prospect of sending users up against media company attorneys backed by statutory copyright penalties ‘eviscerated the whole idea of counter notice.’” But including this statement from an unnamed OSP representative contradicts other anecdotal evidence published in the report, like this observation by the authors: “Several respondents said that the most consistent predictor of a low-quality notice was whether it came from a first-time, one-off, or low-volume sender.” In other words, the most likely senders of “questionable” notices seem to be parties other than the big media companies with their scary attorneys, including entities that have no business using DMCA at all because copyright infringement is not the issue.
Based on conversations I have had with pro-copyright experts, the report is fair in suggesting that the language in the DMCA, which contains words like “under penalty of perjury,” can frighten people away from using counter notices, particularly if a takedown request comes from even a mid-size business and the recipient is an individual. In these cases, it is reasonable to imagine the target of a notice might be apprehensive about asserting his/her right to use a counter notice without consulting legal counsel. This is a valid point for consideration, and surely, well-intended individuals making creative or expressive uses of works should not be frightened into silence by virtue of their financial status. But it is important to maintain perspective with regard to which segment of the market we’re looking at and what type of players are involved in a potential conflict. In many cases cited by critics of DMCA takedown procedures, the purposely abusive notices tend to be anomalies, they often occur in foreign markets with weaker civil liberties than ours, or they are often remedied without litigation.
Meanwhile, individual rights holders of limited financial means face their own apprehensions and challenges in asserting their right to protect their works. As rights holders of all sizes have demonstrated repeatedly—and this report even addresses the problem—the ability for multiple, random users to file counter notices and restore clearly infringing material—and for OSPs to monetize those uses with impunity—puts rights holders at a tremendous disadvantage. It should also be recognized that none of these uses (e.g. a whole TV show or unlicensed song uploaded to YouTube) could rationally be defined as UGC (User Generated Content) when the uploaders have not generated anything at all. Hence, even the original intent of DMCA is not being fulfilled when the safe harbor shield continues to sustain these types of infringements.
It would take many more pages to fully delve into the details of the Berkeley/Columbia report, and the authors do fairly cite several challenges faced by rights holders in applying DMCA. Although the study is partly funded by Google, that alone does not disqualify its contents for me. I cite reports funded by MPAA and other rights holding entities and think a study should stand or fall on its own merits. This one reveals some valuable insight; but it does not seem to adequately support those big headlines about DMCA abuse, which will surely be repeated in comment threads, blogs, and future articles.
*NOTE: This has been altered from original publication based on comments (see below) from one of the report’s authors, Jennifer Urban. Originally, I stated that the team had used an algorithm to identify notices that may implicate fair use, and this was an error on my part.
I’d hate to be the one to say this, but there is proof of mass amounts of dmca abuse. That proof is all over the internet.
HBO sent a DMCA takedown notice to google to remove a download page for VLC Media player. Not only did it not own the copyright for VLC, but it claimed that the Media player was infringing on the HBO series Game of Thrones.
The company behind Ashley Madison sent out DMCA takedown to anyone mentioning the breach of their website, including one web developer that made a tool to check if a person was listed in the exposed data. Not only is it censorship, but data is not protected by copyright. Only the presentation of the data can be copyrighted.
Warner Brothers knowingly abused the DMCA process and decided to cover up that fact by blacking out the imformation in question in offical court documents, the court case in question being Disney vs. Hotfile. Warner Bros. used bots to Identfy content based on Filenames and similar attributes. The actual file contents were never seen. The system also had the quirk of constantly identifying filles smaller than 200 MB as Warner Bros. Movies. They let those files be taken down anyway.
If you need more proof, Look at the wtfu tag on youtube. It stands for Where’s The Fair Use and has many reviewers along with other Content Creators giving for exampes of what is wrong with the DMCA system and in turn Youtube’s copyright systems.
Most Major companies use bots to find their content. There are two problems with this. In the rulling of the court case Lenz v. Universal Music Corp., It was ruled Copyright Holders must do two things before sending a takedown notice.
First the Holders must have a “Good Faith” belief. A Bot can not do this. A Bot is not sentient and thus can not believe. Bots mearly act on the instructions that have been given. The idea of a “Good Faith” belief can not be measured, so you can not give instructions for it.
Second, Holders must consiter “Fair Use” before sending a takedown request. A Bot can not see if something uses “Fair Use”. All Bots see is that content is being used, not the context the content is being used in.
You spoke of the Berkeley/Columbia study saying that they used automation to calculate the numbers of DMCA notices that had the potential fair use. That is not true. They actually looked at the claims and corresponding content. They took the time to look through the claims, something major companies will not do.
You also say in the article that Users have the last say due to the counter notice procedure. There are two problems with this, one of which you point out. Compaines always act like they are in the right. They exaggerate the possible punishments. They threaten the acused with the possibility of legal costs, lawyer costs, or Perjury.
Perjury is too hard to prove in court. The threat is often used on those that fall under fair use. However the cost of Lawyers is high. Companies will ask for appeal as much as they can, meaning the case could go on for years.
The other problem with your statement is that Companies themselves have to consequence for abusing the system. As I said before, Perjury is a useless measure. Companies are free to back out of a DMCA claim and simply send another Notice. This means even if a person sends a counter claim, the Holders can back out and start the process again.
You also state “And let me pause in this regard to remind readers that when Congress passed the DMCA in 1998, it was their expectation that OSPs like YouTube (fledgling little lambs that they were) would cooperate with the major rights holders to develop Standard Technical Measures to address online infringement while protecting these platforms from liability. The OSPs continue to enjoy that protection while rights holders are still waiting for the cooperation on the infringement thing.”
There are several problems with that statment. The 1998 internet was not the same as the internet of today. You also specifically bring up youtube, a website that did not exist at the time. You call the OSPs “fledgling little lambs that they were” This implies that you look down on them. Not only is that wrong moraly, but the U.S. has something called Corporate personhood. The Concept says Companies have the same rights as People in the U.S.. There is also the 14th amement, I sugest you read it.
Also, if you are going to insult the Companies hiding behind the safe harbor, then take a look at all the Copyright holders that shoot holes it it. Companies have chipped holes in it to the point where it’s not protecting anyone. If the point of the safe harbor is to protect, why are you trying to drop bombs through the roof that’s already been destroyed?
Also those within the safe harbors do try to cooperate. They process notices as fast as they can, many even put systems in place to help catch uses of a company’s copyright for for them. Yet Rightsholders still say that is not enough. If anyone is non-cooperative in that situation, it’s the Rightsholders. It seems Rightsholders have unreasonable expectations.
Wow, Scarlett. I mean thanks for the comment, but as you went all the way from DMCA to corporate personhood and the 14th Amendment, I’m not quite sure how to respond other than to say that your information on several topics is simply wrong. You’re wrong about the dynamics between rights holders and OSPs. wrong about the mechanisms in DMCA, wrong about the scope of abuse, wrong about your understanding of the use of bots (even the Berkeley study doesn’t support what you’re saying), and just kinda all over the place. Your comment begs about 2000 words worth of responses, and ain’t nobody got time for that.
Agree to disagree, but I can call you out on the bots.
I work in IT and know first hand how bots work, as bots are often used for, amoung other things, using ping functions to bring down servers and mass commenting or search.
Although reading back over, I must agree with you that the Corporate personhood comment does not really fit in well.
Fair enough, Scarlett. In this context, however, the biggest users of bots for sending DMCA notices are the major rights holders (NBCU et al), and they have a considerably high accuracy level given the volume they send out, and they also employ human oversight. The Berkeley study even confirms this, so part of my point in the post is to maintain perspective as to what segment of the market we’re looking at. While it is true that a bot cannot assess fair use, neither could the Berkeley researchers by the method they applied, and they don’t actually claim to do so. They simply identified 7.8 million notices that could be candidates for a fair use analysis. That’s not really all that high a number given the number of expressions and uses on the internet on a daily basis. But no matter what, the study does not actually state that 30% of the hundreds of millions of notices filed are “questionable”. They identify nearly 30% of a certain segment of the notices, and half of these are actually targeting infringements but have errors, which is not terribly compelling.
As stated, the study is not worthless. It provides some interesting insight (albeit anonymized), but it does not appear to support the short-hand reportage that DMCA abuse is rampant.
I’d suggest one of the big issues with it is that any serious look at DMCA abuse needs to be qualitative, not quantitative research. The issue is less to do with raw volume; it’s about certain specific sectors and abuses.
More research really does need to be done in that area before we can say for sure, but I’m inclined to agree that bots probably aren’t responsible for the really egregious abuses. It’s that lack of research that makes me welcome this report. I mostly agree with your criticisms (I’m afraid I have way too much work on to do a proper fisking myself at the moment), but this is an area where academic research is far too thin on the ground. Hopefully this will lead to more serious work coming out.
Thanks, Sam. As mentioned in the post, the study contains a whole qualitative section. I had drafted comments about it, but dropped them as the main focus of this post was the big 30% headline, which comes entirely from the quantitative section. While the qualitative data yields some interesting observations from both rights holders and OSPs, it is unfortunate that by necessity most of the respondents are anonymized. This doesn’t disqualify observations per se, but given the wide range of possible errors in filing DMCA notices or counter notices, that context is essential. There are innocent mistakes and there are abuses, but I suspect that even if we could quantify the volume of abuse, it would not often point to rights holders, let alone major rights holders whose works are the most frequently infringed. And even apparently egregious errors, like Ashely Madison, don’t wholly express abuse absent further investigation. For all I know, some idiot at that lame-ass organization thought DMCA was the appropriate tool to use. It wasn’t. My colleagues who support copyrights said so. And that’s that. What do we really learn from such examples about the extent to which DMCA lives up to its original intent to balance the interests of both OSPs and rights holders?
Meanwhile, rights holders are able to be quite specific about the volume of infringement that is unintentionally protected by the DMCA as well as the mechanisms employed by OSPs to dodge responsibility for meeting their obligations (see COX v BMG or anything VoxIndie has to say about Google and DMCA), while I still find the quantitative and qualitative evidence of abuse pretty thin to date.
Hi, Sean and David-
Thanks for your comments. Sean, as David mentioned, there is a long qualitative study, based on in-depth interviews and surveys with OSPs and rightsholders. We were able to get a very good cross-section of OSPs, but were limited to large rightsholders. (We started with OSPs, as they are in the middle of the process, between rightsholders and targets). We were also able to talk to some REOs.
Anonymizing the interviews was a condition for both the OSP and rightsholder respondents–there is such heightened debate on all sides that it was the only way we could garner a good cross-section of respondents. We did provide as much context as we could without identifying individual OSPs or rightsholders, but David, you are right that the ability to give more detailed context in the paper would have helped “color in” the picture for those reading. As you can imagine, we ended up with a great degree of fascinating and useful detail that we had to fold into more generalized discussions. The themes, however, were quite consistent across interviews.
As far as mistake and abuse–they are qualitatively different in my view, but because the difference relies on knowing the sender’s intent, it’s a bit hard to pull them apart. They arise from several directions. Both bots and people make mistakes. Bots can be used abusively or just carelessly. It’s as yet unclear to me how _often_ the first occurs, though we did hear of clearly abusive tactics; the second (carelessness or poor algorithm control) was quite apparent in the quantitative studies. I think (and we suggest) that these issues can be improved by encouraging or requiring good algorithmic practices by both rightsholders and OSPs. With regard to human decisionmakers or senders, abuse can be, as David mentioned, because the law is misunderstood by a person. That doesn’t make the issue less serious for the target, of course–but it does suggest that better information about when copyright notice and takedown is appropriate could help. There can also be, unfortunately, intentional abuse, because notice and takedown, whatever else it is, is a much easier route to removing material than the alternative (a lawsuit).
Where you come down on addressing these issues will often, I think, have to do with how you view the overall system. For some people, speech is so crucial that even a much smaller number of mistakes or abuses than we observed would need to be addressed better. For others, it looks more like a numbers game, in which some percentage of mistakes is ok on the way to meaningfully addressing piracy. One of the most striking things for me in our research was how much it varies by where you sit in the ecosystem–notice and takedown is a very different process for different OSPs, and different rightsholders. I actually think there is more potential common ground here than we’ve seen in some of the public discussion.
Legally, we focused on suggestions that we hope could limit the incentives for carelessness or abuse, and make the counter notice process more usable (at least theoretically–a study of targets would really help here) without undermining the benefits of takedown. Many of our suggestions, however, are actually practice based. We were able to hear lots of different approaches to practice and to put that together with other observations.
Hi, David–thanks so much for your interest in our study, and for your thoughtful comments. As you mentioned, it is a deep dive of a study (three studies, actually) and there is much more to it than the 28% number from Study 2 that has gotten the most attention.
I would be delighted to talk further about the study, our methods, etc., if you would like. I have some thoughts on several things in your post. And as we were not able to interview small rightsholders–only large rightsholders–we are very interested in more thoughts from smaller copyright holders who may not, for example, have the same access to REOs or automated systems that the big folks have.
I do want to offer an important correction on our methodology: we did not use an algorithm to evaluate fair use or other substantive features of the notices. Every one was reviewed by hand.
Regards,
Jennifer Urban
Thank you for writing Jennifer. Feel free to comment here or to email me directly. I post retractions and/or corrections when necessary, and I’m willing to admit that I may have misunderstood your methodology. I did not get the sense that you manually assessed nearly 8 million notices to determine that they were likely fair uses, the language I quote from the report sounds more generalized than that, and this analysis by people who know the law much better than I seems to echo the same point if not the same technicality.
I’m happy to make a note that the algorithm statement is incorrect, but the substantive point for readers comes from the study on p. 96, “We could not do a full fair use analysis, which requires more detailed information and review, and the final merit of any potential fair use claims within this set will vary. Our goal was to observe whether automated systems appear to generate any significant number of notices for which more contextualized human review is needed to check for fair use.” If that statement is accurate, I believe my critique is likewise accurate that you identified notices which could implicate fair use defenses, but which may not. In fact, I did not mention in the post that the study lists “cover songs” among this data set, which are typically derivative works requiring compulsory licenses and not fair uses.
Regardless, happy to hear your thoughts and thanks for writing.
DN
Thanks, David. Ah, I see where the misunderstanding came in. As with any statistical study of a massive dataset like this, we reviewed a random sample, large enough to give us a very small margin of error (+/-2.29 at a 95% confidence interval, and +/-3.02 at a 99% confidence interval). (See bottom of page 81 and top of page 82.)
We reviewed every takedown request in the sample by hand–it was nearly 2000 notices (1826 to be exact).
As to the quote from page 96, that is certainly correct. Let me see if I can better explain the goal. As noted in the paper, we were interested in identifying requests that presented a clear fair use question–for which you’d want human review and judgment of some sort. To be absolutely definite about fair use, in the end, would require adjudicating it in court, which of course we couldn’t do. We looked at all the information we could get, but that doesn’t mean we had the full context. To deal with that, we were conservative in our categories* (if we had identified everything that might be a fair use under any plausible theory, it probably would have been higher).
(Note that this is the same for the copyright claim in the takedown request–without adjudication in a court, it is alleging infringement only. Of course, in many cases, the claim is very clear. In others, it’s not so clear. I’ve been a copyright lawyer for years, and am still amazed at how factually and legally complex copyright cases can be.)
As noted in the analysis, I think there are ways to combine automation and human review that could help flag potential fair uses (or other issues) and triage those for humans to take a look at.
*Re your point about covers, indeed, the composition is subject to a compulsory license. There are other aspects of distributing covers that may implicate fair use–i.e., it’s a mixed question–which is why we included it.
I would be glad to correspond over email, but could not find your email address. Please do write to me at the address I gave you.
Regards,
Jennifer
Thank you, Jennifer. I shall write you via email. I altered the post to (I hope) accurately reflect your methodology and added a note of the change to the bottom. I appreciate the clarification and the courteous exchange.
Regarding covers, I realize that there are circumstances that can implicate fair use, which is why I chose not to comment on it in the post even though it caught my attention. In general, my goal was not to bust the study open. For one thing, it’s 160 pages. For another, I’m not fully qualified to do that. My goal with this post, which I believe to be fair, is that the short-hand reportage following its publication naturally leads people to think that 30% of all DMCA notices are “questionable,” which is not what your findings indicate.
Thanks,
DN