In a recent post on Techdirt, Parker Higgins plays a somewhat familiar refrain when he blames copyright for causing a general extinction—or inaccessibility—of various works. Describing a kind of dark ages for researchers, historians, and journalists— whether amateur, student, or professional—Higgins presumes to draw a very big conclusion in a very short post and consequently begs more questions than he bothers to answer.
Right off the bat, if we really are seeing an unprecedented dearth of available works in the categories Higgins cites—published books, news archives, historical research, investigative journalism, photographs—then each of these subjects wants its own discussion since the production, distribution, and preservation of works in each discipline are distinct from one another. But I suppose if one is going to casually declare that “Stuff is disappearing all because of copyright,” then perhaps lumping all stuff together makes a perfectly adequate blog post for people inclined to believe the premise in the first place.
Referring to the National Digital Newspaper Program, an archive that apparently contains no sources more recent than copyright’s 1922 boundary, Higgins makes this slightly overwrought declaration:
“…the dark cloud of copyright’s legal uncertainty is threatening the ability of amateur and even professional historians to explore the last century as they might explore the ones before it.”
In this context, I suppose we are meant to conclude that “uncertainty” (i.e. complexity) in copyright is confounding this newspaper archive program, even though Higgins states that they rather certainly do not digitize works from 1923 onward. So, it’s not clear where the confusion lies for the program’s administrators. But even if the 1923 constraint (i.e. length of terms) is itself worthy of debate, the more audacious part of Higgins’s statement can hardly be meant to be taken literally. Regardless of the copyright status of any particular project, archive, or work, I am reasonably sure that it’s still easier to explore the 20th century in greater depth than, say, the seventeenth. Heck, some of us still kicking actually remember the 20th century.
But I don’t mean to entirely dismiss the point Higgins is making. Naturally, works that do exist from any period up to the early 20th century may, without copyright restriction, be digitized and organized into a useful archive for the amateur or professional researcher. In fact, while working on my post about Van Gogh, I referenced an incredible archive representing fifteen years worth of labor by researchers and historians working with the Van Gogh Museum in Amsterdam. Not only have they made all of Vincent’s correspondence available, but the writings are complexly cross-referenced and searchable with insights, footnotes, and related drawings.
And while it is true that Van Gogh’s works are in the public domain, it does not stand to reason that this database could never have existed otherwise. In fact, this particular archive is so good, so painstakingly assembled, that it earns a natural exclusivity, which could easily have been compatible with licensing the works or collaborating with an estate, if that had been necessary. The point is that preserving many types of works in a meaningful way takes desire, talent, and resources that can be far bigger hurdles to overcome than copyright protection. Meanwhile, random, free-range copying of works motivated by a wide range of purposes is not necessarily sufficient to effect valuable preservation. And copyright’s complexity or length of terms is unlikely to be the only catalyst—if it is a catalyst at all—among the forces that foster disposability, not the least of which is digital technology itself.
As a broad observation on this matter, it is curious that those who preach the value of “abundance” bestowed upon society by digital technology—and this is certainly true of Techdirt’s editorial gist—fail to consider that with increased volume in the production of anything, disposability will likewise increase. And this is particularly going to be true with intangible commodities like creative, scholarly, and even amateur works that have increased exponentially with the advancement of digital tools for production and distribution. We read, watch, listen to, and share more stuff on a daily basis than at any time in history, but I suspect we also mentally discard a great deal of it and move on to the next pile of stuff the next day—or the next minute.
And this is in fact how Web 2.0 is designed to function economically—not as an archive of all knowledge as it is sometimes loftily described—but as a system that financially rewards the sites that can draw attention to whatever is trending in the nano-present. Whether it’s an expert analysis of a global trade agreement or the current disposition of Kim Kardashian’s butt is irrelevant to the economic interests of the site owner. Clicks is clicks. And daily traffic is what puts money in the bank. (This, by the way, is why even the expert analysis of a trade agreement, might display a photo of Kim Kardashian’s butt in the sidebar.) The motivation to preserve and to archive valuable works, is a wholly separate matter from these economic drivers; and it turns out that even important stuff can disappear from the Web at an astonishing rate for reasons having nothing to do with copyright.
In this regard, I’ll draw your attention to the article cited at the end of Higgins’s post, a very interesting story by Adrienne Lafrance writing for The Atlantic. The centerpiece of her article is journalist Kevin Vaughn, who in 2006, while working for The Rocky Mountain News, began researching the families affected by a terrible incident in 1961 in which a train collided with a bus. His work ultimately led to a multi-part web series called “The Crossing,” which drew tremendous support from the local community expressing a deeply personal connection to the tragedy. Then, as Lafrance writes, “In 2008, Vaughan was named a finalist for the Pulitzer Prize in feature writing for the series. The next year, the Rocky folded. And in the months that followed, the website slowly broke apart. One day, without warning, “The Crossing” evaporated from the Internet.” Lafrance goes on to describe how Vaughn was able to resurrect at least part of “The Crossing” from assets saved to a DVD, but the point of the story seems to be the ephemeral reality of the Web contrasted with its illusion of permanence.
It’s worth noting that Lafrance’s article never mentions copyright in any context whatsoever. Instead, I would argue that what we learn from the piece overall is that the motivations, processes, and resources necessary to preserve anything are much the same as they were in pre-digital times, but that there are even greater challenges with digital and web-based assets than with physical ones. Namely, they are inherently easier to lose. And above all, it is folly to believe that online is synonymous with forever.
“Saving something on the web, just as Kevin Vaughan learned from what happened to his work, means not just preserving websites but maintaining the environments in which they first appeared—the same environments that often fail, even when they’re being actively maintained. [Alexander] Rose, looking ahead hundreds of generations from now, suspects ‘next to nothing’ will survive in a useful way. ‘If we have continuity in our technological civilization, I suspect a lot of the bare data will remain findable and searchable,’ he said. ‘But I suspect almost nothing of the format in which it was delivered will be recognizable.’”
So, it is odd that Higgins would even cite this article to support a thesis that copyright is the culprit in the loss of important journalism, when Vaughn’s conflict was in fact one with technology. The only lesson the preservationist can reasonably take from this example, or from the broader points made by Lafrance, is that both the will and the resources to preserve an archive must exist prior to an event (e.g. a business closure) that can shut down a web platform, leaving behind not even a scrap of paper as a primary source. No doubt, most of us with hard drives full of unsorted, digital family photos can relate to this challenge, knowing that these assets are stored on devices whose obsolescence is far more immediate than the shortest copyright term ever proposed.
Nevertheless, what Higgins seems to be implying is that a reduction in copyright, which would allow free copying and sharing of assets might protect a work like “The Crossing” because it would not have resided in only one place on the Web. He writes, “Just last month, flooding threatened a priceless collection of photos in the New York Times archive; had those images been digitized and widely copied, no single flood or fire would pose a risk.”
But even if widespread and random copying could be expected to preserve an older collection like these Times photos (and there are reasons why it would not), it is unclear what copyright amendment Higgins would propose at all with regard to a comparatively recent work like “The Crossing.” How would simplifying what he calls “the arcane and byzantine rules created by 11 copyright term extensions in the years between 1962 and 1998” help address the fundamental reasons why a work distributed exclusively online in 2006 disappeared two years later? Perhaps Higgins is proposing that the solution would be no copyright at all—in which case he should say so—but then this begs the question as to why there was ever a Rocky Mountain News to hire a Kevin Vaughn to create “The Crossing” in the first place.
Ultimately, Higgins’s post is consistent with a general bias that we have the technology to make the world’s works accessible and useful, and it is therefore antiquated to allow copyright to thwart this capability. That may seem rational on the surface, but unless we want to boil that premise down to Let’s just allow Google to digitize and control it all, the conversation becomes far more complicated. At the very least, we need to consider that human capital required to make works accessible in a meaningful way, that the technical sustainability of digital assets themselves is uncertain, and that wired life has an effect on disposability, on general knowledge and awareness, and even on memory itself. Meanwhile, it’s too easy to casually declare that stuff is disappearing all because of copyright without even examining what may or may not be disappearing at all.
On this matter, probably the most compelling citation made by Higgins in his post is a reference to research by Paul J. Heald into the apparent disappearance of American books from the mid-20th century. But Heald’s conclusions deserve a thorough response as Part II of this essay.
I think it was estimated that the average lifespan of a web page is 77 days. If I were to go back through the bookmarks I’ve collected over the years many of them would be defunct. My website has 5000 pages, all of which is dependent on me paying the yearly hosting fees, and keeping the CMS software running.
Digital content is entirely ephemeral. I bet people have drawings that their kids did on scraps of paper 40 years ago. The kids will have boxes of notebooks and essays that they filled at school, and college from 30 years ago. But how many have the wordprocessor documents they wrote 20 years ago? People have boxes of old photographs, photo albums in attics and under beds that go back 50 or 60 years, but how many have photos stored outside of their phones, or laptop?
I don’t think Parker is trying to argue that widespread and random copying can preserve old collections. The point being made related to term is that archival institutions, libraries, etc. that DO want to digitize works not in the public domain cannot always do so because of various legal uncertainties related to the asymmetric informational aspect of copyright ownership, especially of works created prior to the 1976 Act. This isn’t just about saying “let’s just let Google digitize everything,” this is about questioning whether, for instance, the New York Public Library, the Smithsonian, or some other major library/archive type institution can digitize, or even display their collections. Because NYPL and Smithsonian, to take just 2 examples, have MASSIVE collections of works, and for a lot of them, they simply do not know who the copyright owner is or whether that copyright owner would give permission, and in a lot of cases, the work in question may have been created by someone who never had any intention of monetizing it, yet that work may still have historical/education significance, e.g. the private photographs of or letters written by a famous and historically significant individual. The problem is that without knowing ownership information, or even whether the work was registered, an institution like the NYPL or Smithsonian has no way to assess their potential exposure should they digitize their collection, or even display the physical collection. Of course, you can say, well NYPL and the Smithsonian are big institutions, they can track down copyright information. But unless that collection is very narrow, there’s going to be a hell of a lot of rights owners to track down. And as someone who’s done rights clearances for short documentaries (something much less vast than the NYPL’s entire collection), I can tell you that rights clearance takes quite a lot of time and effort to figure out who owns what, contact them, hope they respond, and then hope that they don’t charge an unreasonable fee that requires you to find something else. Clearly, rights clearance is important, and the music space is different because so much of the music is intended to be commercialized. But there’s no doubt that from a policy standpoint it makes little sense to force institutions to go through such labor and time-intensive, and often costly searches to track down information for works that are not even commercialized in the first place.
Maybe a shorter term isn’t the right solution. Maybe there should be better, narrowly-defined exemptions for these types of purposes. But there’s no doubt that libraries and archives are challenged, they wouldn’t be as challenged if the term was shorter, and many of the works that are the subject of this challenge are works that the author wouldn’t have a problem allowing an archive to use, but of course the burden of finding out all this information is at many times quite high. So while the EFF may take an alarmist position too often to support what are really tech interests, at the end of the day, there’s legitimate points to be made about copyright term and how it affects archive-type institutions, so it’s unfair to completely downplay those concerns based on the source of those concerns.
I think there are legitimate conversations to be had regarding such institutions, and my primary criticism of Higgins is that he’s all over the place in a post way too short to address any of the issues seriously. Hence, he seems to just want to say copyright is screwing us up and be done with it.