In Part I of this essay, I responded to a post written by Parker Higgins for Techdirt, criticizing him for trying to pack a big, unexamined conclusion into a small article. Asserting, as Techdirtians are want to do, that copyright is the omnipresent saboteur in our otherwise grand, digital machine, Higgins blames copyright’s complexity and length of terms for causing important works of the 20th century to “disappear,” thus harming historical journalism and other endeavors. He cites a number of what I believe to be unrelated and ill-considered examples, several of which I addressed in Part I. But I left out the most compelling of Higgins’s citations—the work of Paul J. Heald, law professor at the University of Illinois—because it demands a best attempt at a more thorough response on its own.
Technically, Higgins cites Rebecca J. Rosen, writing for The Atlantic about the professor’s statistical research. Heald looks at the availability of published books via Amazon and concludes unequivocally that “copyright makes books disappear.” To support this claim, he cites his research data, which revels peaks in the availability of books in the public domain and in the availability of very recent books, with a sharp decline in the availability of books from roughly the 1930s to the late 1990s. And while it is true that this period roughly corresponds to works still under copyright (1923-present), it’s not entirely clear that Heald’s research reveals either a relevant lack of availability, or that that copyright is the catalyst to explain his findings. I have read the part of Heald’s paper that deals with books (he also addresses music) and admit that my reading may err, but I think we should be careful, for instance, about how we interpret summaries of Heald’s work like this one by Rebecca Rosen:
Heald says that the WorldCat research showed, for example, that there were eight times as many books published in the 1980s as in the 1880s, but there are roughly as many titles available on Amazon for the two decades.
To an observer who chooses to look solely at the quantity of available works as a percentage of the total works produced in a given period—and who might have a nascent beef with copyright—this statement may seem rather compelling. But how many factors are being left out of the equation? Maybe quite a few. Heald’s data set comprises a little over 2,000 works sampled at random, which in itself seems like a flaw because a random sampling of ISBN numbers querying the Amazon database should naturally produce a higher percentage of public domain books simply because there are vastly more editions of books not under copyright. Heald does account for multiple editions in winnowing his initial sample of 7,000 titles down to the 2,266 books studied, but he does not seem to account for the probability of skewing toward public domain works by percentage in the initial, random data acquisition.
Additionally, although the researchers seem to have done their best to randomly sample comparable commodities (e.g. fiction novels to fiction novels), Heald’s findings do not appear to account for more nuanced factors, like the certainty that a higher volume of short-lifespan works was produced in the the 1980s compared to the 1880s. He acknowledges that total volume would naturally be higher in the 20th century than in the 19th, citing changes in printing technology, but he does not appear to look at the nature of the works themselves and then to ask how much of the sloughed off volume represents natural disposability (i.e. for which there is no sustainable market demand). Heald does address generalized demand in his paper but in a way that also appears flawed, about which more in a moment.
One detail that leapt out for me in Heald’s data is a marked drop in the relative availability (by percentage) of new books available that were originally published the 1980s compared to the rest of the otherwise fairly flat mid-20th century. Presumably, copyright is a constant from 1923 to the present, so the dip in the 1980s compared to the other decades of the century is likely explained by other factors—factors that may apply throughout the results across the entire range of study. Hence we should be very wary of a pull quote like this one used at the beginning of Rosen’s article:
A book published during the presidency of Chester A. Arthur has a greater chance of being in print today than one published during the time of Reagan.
Again, that sounds intriguing but may not say quite what we think it does. Based purely on anecdotal knowledge of the 1980s, Heald’s data revealing a noticeable decline in relative book availability would seem to coincide with a decade marked by “conspicuous consumption,” a time when publishers would have been very likely to produce a high volume of relatively disposable works in both fiction and non-fiction. For instance, did the 1980s see a sharp increase in one of the most disposable genres—that guilty-pleasure among women readers known as the romance novel? Certainly, according to Wikipedia, 1980 happens to be the year that Harlequin Romance launched its North American product line. Romance novels as well as books like trade-paperback mysteries, self-help, and diet books tend to have very short lifespans book-by-book; and if publishers did increase their output of these types of products in the 1980s, it could explain part of Heald’s data and have nothing whatsoever to do with copyright terms.
So, the statistical expression “greater chance,” can be very misleading. If, for example, 60% of the works from 1881 are available compared to 20% of the works from 1981, then the pull quote cited by Rosen is factual but meaningless, particularly as it may or may not inform us about the role of copyright. The conclusion can be accurate but still not tell us whether or not we have a greater number of works available from 1981 than from 1881, to say nothing of the theoretical market value of the unavailable works from the latter year.
Even where Heald limited his data set to works of fiction, his paper does not indicate what kind of fiction is being sampled. And this is a general caveat I would propose when interpreting the entire study: that without corresponding Heald’s data with some relevant market-research information (i.e. what people are reading and why), we are not learning the kind of information required to draw sound conclusions about the role of copyright.
Consider, for instance, that the most ardent readers among Baby Boomers and Gen-Xers have read many of the books of the 20th century and may even still own copies of their favorites. Hence these most voracious readers are apt to seek out the most contemporary literature and perhaps older works they never encountered, but they’ve already done many of the books of the mid-20th century. So, what are millennials reading today, either by choice or by requirement in schools and universities? Because there is no question that this generation, for better or worse, has a very different relationship to culture, literature, and media in general than their parents and grandparents. I know my own kids’ school experience has (much to my chagrin) been lacking in required reading of books we would call the 20th century canon. Does this hold true in public schools across the country? If so, how can this, or any other ethnographic study, not be considered in concert with research like Heald’s?
Meanwhile, some rudimentary searching on my own reveals that works from both best-seller and best-books lists of the 1980s are certainly available via Amazon. Though I admittedly did not search every title, it seems that we can find Amy Tan, Toni Morrison, Umberto Eco, and even Danielle Steele, if we are so inclined. Hence, it appears that Heald’s research may tell us nothing about the rate of availability, decade by decade, of what we might generally agree to call “significant writings.” If that’s a fair assessment, it does not entirely dismiss all of Heald’s findings, but it does suggest that reporters and pundits should be leery of interpreting his data to support the “disappearing 20th century” claim.
Still, if it is true that the availability of “significant writings” from 1980 is actually not that different from the availability of “significant writings” from 1880, this may support Heald’s stated objective in his paper, which did not apparently set out to prove that “copyright makes works disappear.” Instead, Heald’s stated proposal is to refute the assertion that present copyright terms are necessary to keep works in the market. This may sound like the same hypothesis, but it isn’t. Setting out to prove that a copyright term of 95 years (for publishers) is unnecessary to keep works meaningfully available is not synonymous with setting out to prove that this length of term “makes works disappear.” It seems Heald began with the former thesis and then shifted to the latter based on what he perceived as “startling” evidence in his data.
One could argue that mechanisms for publishing works in the public domain are as effective as mechanisms for publishing works under copyright and that the public is at least equally served by either regime. As long as desired works are available, then they’re available. But, the argument Heald is making is that the current terms are underserving the public because publishers hold copyrights on works still in demand, but also refuse to publish these works. If this is true, then Heald is presumably correct that the terms of copyright on these unavailable works provide no benefit to anyone.
But in order to make the assertion that publishers are choosing to sequester a relevant volume of books in demand, he needs to prove at least two things: 1) demand for the actual works in question; and 2) that the copyrights on these works are still held by the publishers and not by the original authors. And if those facts can be demonstrated, one must then make an argument for reducing the length of terms without running afoul of copyright’s incentive to create and publish the most “significant writings” in the first place. To put that another way, we’d want to ensure we do not fail to incentivize the next Joyce Carol Oates just so that some e-publisher can make a few dollars off books that had earned a natural disposability in the market. Perhaps that is a term length shorter than 95 years, but it seems to me that Heald’s research provides no guidance as to what that revision ought to be.
The Demand for Missing Works
Heald’s research makes no effort to answer the second question I posed above, which is to ascertain the actual copyright status of the books presently unavailable. This is particularly relevant because with many of the aforementioned short-lifespan books (e.g. trade paperbacks), the exclusive copyrights revert back from publishers to authors rather quickly. And since Heald’s data makes no mention of the types of books selected at random and does not factor for current copyright status of any of these books, it seems unreasonable to draw his conclusions about the motivation of publishers to keep works unavailable based solely on his findings.
On the other hand, Heald does make an effort to ascertain whether or not there is a demand for the unavailable books, and he states clearly that if this demand does not exist, then concerns about unavailability are irrelevant. But, again, in attempting to determine demand for these works, it looks as though Heald is using information that does not point to a demand for missing works since there are no missing works in the data set and, again, he foregoes market research altogether.
Heald compares the used books available on abeboks.com by decade to the number of new books available on Amazon by decade. The assumption is that the inventory of a used book dealer is an indicator of consumer demand, which is reasonable, but the data reflected only demonstrates that, for instance, there is an availability of used books from the 1970s that is greater than the availability of new books from the 1970s. Of course, neither line graph tells us anything about sales of either used or new books from the 1970s (to say nothing of which books we’re talking about), but Heald asserts that the gap between the available used books and available new books represents an unmet demand for titles that could be, but are not, sold as new books.
So, without seeking more detailed market information, it seems very hard to leap to the conclusion that an unmet demand for newly published mid-20th century books exists, let alone that copyright is the cause of the problem. After all, his conclusion suggests that a publisher might see profitable demand for one of its titles yet decide not to republish that book for inexplicably self-defeating reasons. I’m not the most savvy businessman in history, but if I had to decide whether or not to spend money to publish some of my titles from the 1970s, this data would not be sufficient to make that call, not especially without knowing what 1970s titles from that inventory at abebooks.com is actually selling. Meanwhile, once again, I find titles from both best-seller and best-books lists available at both Amazon and abebooks.com.
But, it is at this point that Heald seems to depart from the question of general availability, relative either to demand or to production volume by decade, and instead shifts his focus to e-book availability as a measure unto itself. He writes:
“In 2014, 94% of 165 PD best sellers (1913-1922) were available as ebooks compared to only 27% of 167 best sellers (1923-1932) were made available as ebooks by publishers.”
Again, this seems remarkable at first, but we should notice, as I say, that Heald has shifted focus from general availability to availability via a specific platform. After all, lack of availability to date in eBook format is not equivalent to lack of availability period. And Heald proves this point himself in citing three particular titles thus:
“In the absence of copyright, surely one could find a publisher providing eBook versions of popular classics like The Gulag Archipelago, Gentlemen Prefer Blondes, and The Magnificent Obsession.”
Surely one could find publishers pleased as punch to freely create eBooks from these works, and for good reason: that all of these books have deservedly retained their market value. And this is precisely why consumers can still buy print copies via Amazon or in a bookstore, find used copies via multiple sources, borrow them from public libraries, and buy all three as audio books from Audible.com The fact that the publishers have yet to make these titles available in eBook format—and there are likely a variety of practical reasons for this—is no excuse for describing these works as “unavailable,” let alone to blame copyright for that false claim, and then to allow this assertion to be exaggerated by pundits and reporters as the “disappearing 20th century.”
Additionally, if one takes a step back, Heald would appear to be making a case for an opportunistic e-publisher (who never contributed anything to the creation of the work) to now reap financial reward from a book by Alexander Solzhenitsyn, of all people, and disenfranchise his sons from any controlling interest in a work published about the time they were born.* And we would do this for a book that is quite clearly available to the market via multiple sources.
While it is certainly true that simply having all works enter the public domain much sooner would lead to a spike in general availability in mid-20th century books, I think it would take a far more nuanced examination to determine whether that untapped “abundance” would justify diminishing the copyright terms for authors of works whose maintained availability may have a great deal to do with their widely accepted value to society. At the same time, niche audience works can be restored to public availability by means other than copyright term revision.
One of my dear friends is the son of the author Michael Avallone, who wrote the Ed Noon detective series between 1953 and 1988. This is the kind of book series that lives in its time and place and then typically goes out of print. But as the co-owner (with his sister) and steward of his father’s copyrights, David Avallone has been able to resurrect Ed Noon, republishing the works as eBooks, and growing a contemporary fan base for the character using social media. This is more a personal project for David than a business venture. In particular, after Michael passed away in 199, the ability to bring back Ed Noon thanks to digital technology has been a very meaningful way for David to give his father’s voice new life, not only for older fans who remember the series, but for a new generation of readers who never heard of Ed Noon.
In theory, if Avallone’s copyrights had expired, it’s true that Amazon or some other on-demand publisher would be free to make these books available—if they could even lay hands on the source material—but the whole venture would be of lesser value, I think, than it is under the management of a loving heir who tweets out Noon-isms twice a day to entice readers. Conversely, if none of this were possible because the copyrights were still in a publisher’s hands who simply chose to let the works be dormant, this would be a shame for both David and for presumptive readers, but it would still not justify claims that a whole century’s worth of literature remains inaccessible. If anything, perhaps it suggests a kind of “use it or lose it” reform to corporate-owned copyrights, but no doubt real copyright authorities would have various opinions about that.
I don’t mean to suggest that Professor Heald’s work is to be dismissed outright, only that the data seems incomplete relative to the conclusions being drawn. There are certainly more qualified statisticians, copyright scholars, and publishing professionals than I who may criticize or support his findings and determine to what extent they tell us anything about the role of copyright as a barrier to access. But speaking as a generalist to the general reader, I’ll maintain that we should not simply buy the stuff is disappearing because of copyright story presented so casually in posts like the one by Parker Higgins. Stuff is appearing, disappearing, and being resurrected at an extraordinary rate thanks entirely to digital technology. The extent to which copyright and its limitations foster or hamper the most beneficial results of all this churning media is not a simple question to answer.
______________________________
* I do not know the current copyright status of The Gulag Archipelago; I mention this as an example in principle.
At the SERCI conference in 2014 I pointed out to Prof Heald that the data he needed for an assessment of the welfare consequences of this “disappearance” were sales figures, not numbers of works. He agreed that those data would be very desirable, but said they were not to be had. There are too many of us lawyers playing at economist, with the result that the blogs have ample half-baked analysis to editorialize about.
Why does this smack me as similar to one of the People’s “Representatives” walking into the People’s House with a snowball in tow “proving” climate change is a hoax [*facepalm*]
I lived through the 1980s and the bookshops were indeed full of dieting, self help, romance and associated stuff. Shelves were full of advice on being a yuppie not to mention all the crappy error riddled computer books that had a shelf life of about 6 months. There must have been 20 different versions of “How to use Wordstar” and 100 versions on VisiCalc, let alone all the stuff on Lotus, dBase II (III, IV), Wordperfect, and Z80 programming.
Indeed if he’s looking like for like he’d need to be counting all the dime novels and penny dreadfuls.
http://web.stanford.edu/dept/SUL/library/prod/depts/dp/pennies/home.html
There are many reasons to dismiss Heald’s paper. Two examples regarding music:
1. He doesn’t get the copyright status of the works in his samples right:
„A director choosing a recording of the Sex Pistols singing “God Save the Queen” must pay a fee to the owner of the sound recording even though the musical composition is in the public domain.”
The composition is not the song from the 18. Century (as Heald believes) but an original composition by Jones/Cook/Rotten/Matlock from 1977.
2. He makes big mistakes in his analyses:
“Consistent with the evidence that both legal status and age are relevant to the availability of a work, a testable hypothesis emerged. Because of changes in the duration of copyright, directors of 23 movies released before 1977 did not have to look backward so far to access free public domain material … The top grossing movies contained equal numbers of films from before and after 1977, a convenient date, given the timing of 1976 term extension.”
The Copyright Act of 1976 went into effect in 1978, not in 1977.
Thank you. I purposely did not address his analyses of music given the context in which Heald was cited by Higgins, but I wouldn’t be surprised at all to find these types of errors.