Probable Causes

iStock_000008641273XSmallIn his book At Home, Bill Bryson describes how the English clergy system, through the 18th and 19th centuries produced a local renaissance in the sciences and arts.  By that time period, the English were not an especially pious bunch, and as such the clergy system fostered a generation of well-educated and financially comfortable young men who ended up with a great deal of time on their hands. According to Bryson, most of these sons of the gentry studied classics rather than divinity and many of them were not expected to do much more for their rural parishioners other than recite an unoriginal sermon on Sunday mornings.  As a result, many of these otherwise idle hands produced a flowering of discovery, ideas, inventions, and creative works.  Or as Bryson describes, “Never in history have a group of people engaged in a broader range of creditable activities for which they were not in any sense actually employed.”  This period yielded, among other things, The Life and Opinions of Tristram Shandy; the power loom; the Jack Russell terrier; numerous first works on botany, paleontology, and other natural sciences; the economic principles of Thomas Malthus; the first aerial photography; the invention of the submarine; and the theorem of Mr. Thomas Bayes.  All the result of time, financial security, and curious minds.

There is a lot of discussion lately, including comments on this blog, about open access, which was of course central to the activism of Aaron Swartz; and the subject got me thinking about this particular revelation in Bryson’s book.  In a sense, we could think of the English clergy system as an incubator much in the same way we’re meant to think of digital technology today as a catalyst for innovation.  There is even a parallel in the democratic aura in which these rectors and vicars became the amateur, DIY scientists, authors, and inventors of their time.  In simple, idealistic terms, recreating this phenomenon on a global scale appears to be a foundation upon which the principle of open access is based — that the next life-altering idea might come from anywhere and, therefore, keeping a running spigot of data is of paramount importance.  To quote the start of Swartz’s manifesto, Information is power…

But is it?

What, for example, would the aforementioned Bayes’ Theorem tell us about the probability of achieving some of the more utopian aims of open access?  (Let’s be clear, I’m personally on the side of allowing especially publicly funded data to flow to the public; but this is a different question.)  Bayes provides a means to predict probabilities based on limited data, and as Bryson points out, the theorem was intriguingly of little use at its conception given that there were no computers to perform the calculations.  Today, Bayes is applied to work like climate change models and financial markets, but could it predict the probability that is the underlying question of this entire blog — i.e. will more access to more data produce more social benefit?

Naturally, we’d have to agree on what social benefit looks like, but assuming we’re using western notions of freedom, social justice, well-being, and enlightenment, does it stand to reason that adding more content into the pipeline must inevitably serve as a catalyst to improve or increase these humanistic goals?  It seems clear that there are far too many variables to accurately make such a prediction.  Even in a broad sense, consider how polarized the U.S. is, then spend about five minutes on the Web searching any number of topics. It becomes self-evident that data aren’t even data — that one man’s fact is another’s government conspiracy and vice-versa.  Or as Big Think posts here, even one man’s exercise can be another’s road to perdition.

Aside from the fact that data interpretation on a macro scale is a total crap shoot — we still have school boards fighting evolution for crying out loud — we might keep in mind the three conditions that were necessary to produce the innovations described by Bill Bryson:  they were education, financial stability, and time to indulge. There are ways in which digital-age tools provide more time, as in the Kurzweilian sense of adding additional brain power; but I’m sure I’m not the only one to feel that sometimes the constant flow of disparate information and social media ephemera can also become an obstacle to focused contemplation.  Additionally, there are aspects of the open access idea that are disruptive to existing economic models, particularly affecting the financial well being of some of the leading producers of quality information and cultural content.

I think the principles of open access are fundamentally good, and often principle alone is reason enough to demand support for a social agenda.  But the principle should not necessarily be confused with the reality that application in this case does not guarantee a renaissance. (The new era could look like 4Chan, too, which is the Web equivalent of the Dark Ages.) History is full of unintended consequences; and while the next big idea can indeed come from anywhere, this includes the possibility that it will originate in the mind of an individual as removed from our digital wellspring as an 18th century English clergyman.

© 2013, David Newhoff. All rights reserved.

Follow IOM on social media:


  • does it stand to reason that adding more content into the pipeline must inevitably serve as a catalyst to improve or increase these humanistic goals

    I would say inevitably that more content, and more open access to said content, will further those goals; it’s just a question of to what extent. Can having more access to more information ever decrease your freedom, for example?

    You say:

    I’m not the only one to feel that sometimes the constant flow of disparate information and social media ephemera can also become an obstacle to focused contemplation.

    It’s a distraction, but only if you allow it to be. If you want to engage in focused contemplation, Facebook isn’t stopping you. And if you want to read the work of a particular intellectual or locate particular data for your own purposes, the internet is an immense help. I suppose social media ephemera can be an obstacle to deep thought in the way that ubiquitous fast food restaurants can be an obstacle to a healthy diet, though one difference is that there’s room for an infinite amount of content on the web, so you won’t have people living in neighborhoods where it’s difficult or prohibitively expensive for them to access “healthy” content. (And granting open access to stuff like JSTOR articles certainly helps here, too). Yet to belabor the food analogy, in an age when obesity has spiked and crappy food is more accessible than ever, it also seems that an increasing number of people have deliberately decided to live a healthy lifestyle and McDonald’s isn’t stopping them. And if open access adversely affects revenue streams for producers of quality content, then that must mean more people are consuming quality content — because for every pirate who would otherwise have bought the original, there are several pirates who wouldn’t. Whether this ultimately drives down the supply of quality content remains to be seen.

    But if you want to ask whether open access is good for enlightenment and social justice, maybe it’s helpful to ponder the inverse: would reducing access to information advance enlightenment or social justice?

    • There’s no question that distractions, etc. are a choice. And as with many of my pieces, their half-thought exercise, half-thesis. I don’t presume to have all the answers to every question I ask. While I have no problem with making publicly funded research available to the public, I do snigger just a little at the more grandiose predictions as to the significance of doing so. To be a snob about it for a moment, I think interest in and capacity to understand high-level research is already limited by process of natural selection in a sense. I’m a pretty bright guy, but I don’t think I can hold my own in a discussion with two physicists speaking in their own lingo and shorthand. My point is we can make it all available to six billion people, but the access will sill be of value to a relatively small percentage of the population for reasons that have nothing to do with any kind of legal restraints.

      This is why I bring up evolution in America frequently in this context. There’s nothing about Darwin that is not freely available to everyone, and the human folly that causes members of school boards to promote Intelligent Design has nothing to do with copyrights. You could say, “keep opening the flow, and enlightenment will follow,” but you have to deal with the truth that the backlash against just evolutionary biology alone has increased over the period since the Web became publicly available. These are coincident realities that have no causal relationship. I simply maintain skepticism about all utopian ideas that fail to account for ordinary human nature, which history proves does not automatically seek the light.

      As for asking the inverse question, inherent in my premise and examples is the idea that access is always reduced no matter what and that “unlimited information” probably has a point of diminishing returns. In fact the English clergy referred to in the post had a very low volume of input by our standards. We are all bound by physical capacity, time, interest, psychology, and our innate skills. So, unless we’re talking merging with the machines to evolve into new beings, I suspect the human condition itself will continue to assert boundaries that supersede legal frameworks.

      • More data is always better when you are able to prune to reduce overfitting. Selecting features from a larger set of data, or pruning the resultant data.. doing it intelligently and effectively.

        In a less abstract level, that’s exactly why what companies are Google/Facebook/Twitter/etc/etc are quite valuable. Because they provide algorithms that ultimately boil down to selecting data from a much larger set of data and presenting it to the user, and doing it in a way that that the user finds pleasant or useful. That’s a kind of service that’s orders of magnitude more valuable in the information age (or “information overload” age) than simply producing even more new patterns of bits to be indexed.

      • I don’t mean this obnoxiously at all, but spoken like a programmer. 🙂 Setting aside questions of auto-fill and search results that may or may not server the greater good, it sounds as though you’re looking at data through the eyes of a computer and not looking at human behavior, namely what people do or don’t do with the data they already have. Either that or I’m completely misunderstanding you.

      • To M’s comments about decision-tree pruning and other refinements that can be made possible by technology, one of my pet peeves is that the government (which has reams of regulations requiring companies to disclose crucial info, like drug interactions) hasn’t required companies to disclose that info in easily navigable (i had always envisioned decision-tree form) on the web. with a drug, for example, the site could ask you questions and determine which disclosures are most important for you to read (and still offer you the others at the end, obviously). the reason this is important is that disclosure regulations are one of the best types (fairly unintrusive, give people the power to decide for themselves), except “info overload” can make the disclosures almost useless.

        if we could rely on the public (or a larger percentage of the public) to actually read disclosures, we could scale back some regulations and allow for more consumer choice without disastrous results.

      • The drug info is a cool idea, although I suspect a site hosting the information would have to be a government site because it’s hard to imagine a private entity wanting the liability.

      • No, I mean it for humans as well. If you are in a situation where useful information is in a sea of irrelevant information– this “information overload” could cause a human OR a machine learning algorithm to struggle to come to reasonable conclusions. Just piling more data on top, creating new content, it not helpful without anything pruning it. So the pruning is key.

        So when you have so much excessive amount of random/irrelevant information you start to overfit and you can’t draw useful conclusions. This problem is key in any sort of learning, machine or human.

      • Except that there is plenty of evidence to suggest that humans don’t make decisions based on information, and they certainly don’t route around irrelevant information the way a machine might. In fact the average chat board seems to veer off into the irrelevant pretty quickly. Likewise, two humans like you and I can have the same data (e.g. about copyright) and come to very different conclusions. You believe your views on copyright are progressive, while I believe they are regressive. We have access to the same information and are both capable of reasoning without getting overly emotional, but we diverge considerably. So, I can only conclude that there are multiple factors (human factors) other than information that lead us to our opposing conclusions, factors that are not like a computer algorithm.

      • @ David,

        To pick up the elitism baton and run with it, I agree that probably only a modest percentage of people will ever be deeply interested in the JSTOR research or similar “quality content.” However, I wouldn’t say the benefits of “openness” are limited to that elite subset, because easy access to information could allow those people to do more scholarship, activism, teaching, etc. So there is a trickle-down effect. Also, when we relax legal constraints, we make information available to people who couldn’t access it before for lack of wealth or sophistication, even if they technically would have been legally permitted to. When academic articles are made public, someone who’s vaguely interested in a topic can google it and find those articles. MIT’s open courseware certainly makes accessible to a larger crowd what was, previously, some very pricey content. I think relaxing restrictions for this sort of material will most benefit people who are already fairly educated but are looking to cite or substantiate a point, deepen or broaden existing knowledge or branch out somewhat. I had access to JSTOR when I was in school but I don’t anymore, and there have been a decent number of occasions when I’ve googled a topic or an author, the query has returned a JSTOR link and I only get this frustrating preview of the article. (Or, if not JSTOR, some medical or scientific databae). More occasions then I can count offhand, actually. For awhile I would “hack” into JSTOR (that’s cheesy but I don’t know how to phrase it) by accessing via a university wireless network. I don’t know if many people are dogged or curious enough to do that, but they might be curious enough to click on a Google result and read the whole thing.

        Anyways, I think we can agree (maybe?) that if you keep opening the flow of info, some amount of increased enlightenment will follow even if the world is not transformed into a utopian salon of informed people. The question is whether there’s also a coarsening effect, or a filter bubble / echo chamber effect, that offsets this particular benefit of technology. My strong gut intuition is to say no, but I think that’s because I’m one of those idealogues who likes information and openness and, to be totally frank, I live an insular lifestyle surrounded by other “elite” people and if I want to be exposed to this particular coarseness or ugliness I need to proactively google it. Though I guess the creationists affect me when they vote.

        And I guess what I meant re: the inverse was — I know information is already reduced/finite/incomplete. I just mean, do you think doing things to deliberately decrease the availability of information (the opposite of these “openness” initiatives…like maybe the government makes a law that no academic work can be available on the internet) would promote important social goals? I feel like the strong intuition in 99.9% of instances is “NO!” But again, maybe that’s just my own intuition talking.

      • Let me just say that I really appreciate the way you approach this conversation, including your frankness about your lifestyle and how that affects your point of view. I understand why people use handles and anonymity on the Web, but as I choose to be a relatively open book, I appreciate when people choose to reveal themselves as well as their opinions.

        In a nutshell, I agree with all of that in principle just as I agree that free speech must be universal no matter how badly I’d like to deny the right to Fred Phelps and the like. I said as much in the post that I fundamentally believe publicly funded research ought to be publicly available, even if one fraction of one percent of the public will find it remotely useful. And yes, of course, there is a trickle down effect when the next great scientist emerges and cures a disease or whatever. My point is to separate that reality from some of the hype about open access, which 1) spills into things I do not support like file sharing, 2) fosters a general belief that more data necessarily makes us freer (although it can), and 3) that the purpose of all constraints, including copyrights, on information exist solely to keep the common man down.

        I’m sorry not to provide a more complete response here. I’m fading.

      • I was thinking the drug companies themselves could host the info — they already take on the liability of verifying the disclosures and the expense of publishing them. It would be nice if the info were aggregated on one website, but if we all take strong positions on safe harbor for content platforms/aggregators (lol), that shouldn’t be a problem.

        In all seriousness, there are sites that currently host SEC filings and aren’t liable for any false disclosures companies make about finances, so I imagine things could work similarly.

      • Could be. I don’t have enough hard information to really judge. Having worked for a few pharmaceutical companies, I’m guessing they may have to be dragged kicking and screaming into such an alliance, but maybe not. If it’s in their interest, that would be ideal. If it sounds like regulation, get ready for new memes saying “Don’t let the government tell you what medications you can take!” Sorry. Couldn’t resist.

      • Yeah, people use anon handles for different reasons. I do it because 10 years ago, nobody anticipated the wayback machine, and while I may mock some of the conspiracy theories about Google I don’t want the pressure of every word I write being, effectively, annexed to every cover letter I ever send a future employer. Pseudonymity can work for this, too, and if I keep commenting here I might register a pseudonym and login to make things easier. But (I know this is a huge tangent) 99% of people using pseudonyms underestimate how easily they can be doxxed, even if they employ countermeasures. So PSA to anyone who cares, if you are relying on a pseudonym then change your identity periodically.

        Anyways, don’t apologize for brevity. I’m actually shocked by how much you manage to write given that you’re fielding comments from multiple people and need to deal with a mod queue and perhaps even a real life.

        I agree with everything you’ve said. I would quibble that I like file-sharing in principle, even if I understand why people whose livelihoods are affected object to file-sharing of copyrighted works specifically. But I understand why “file-sharing” can be shorthand for “piracy.”

      • Thanks.

      • The resulting meme would probably be “sigh, we live in such a superficial ADHD media-saturated culture that the government is now forcing companies to repackage life-or-death info into, effectively, quizzlets or buzzfeed listicles with cute little graphics, just in the hope that lazy dumb americans will go to the trouble to actually read them?” It would be voiced primarily either by outlets like Slate or outlets like Fox News. There might be a TED talk calling it a good or a bad idea.

        You get the pharma companies on board by pairing it with loosened regulations generally. With more disclosures and more people reading them, maybe it’s ok for the FDA to back off a bit — as long as people know the risks, let them do what they want. The main (serious, non-meme) criticism would be that it’s elitist policy making life marginally better for the well-informed, internet-savvy, educated public who can exercise careful judgment and have plenty of access to healthcare, but the loosened regulations just make life more dangerous for poor people who can’t be trusted to read websites. This criticism would come entirely from other rich, educated people, obviously.

  • I think you are trying to describe the concept of overfitting, which is related to Occam’s razor.

    Techniques to deal with overfitting are numerous (like decision tree pruning algorithms, which are essentially algorithms that implement Occam’s razor), and it is part of a still extremely active research area (you could read more about this in open access machine learning/AI journals :)), but I think the adage “having more data is better” is usually true when it comes to making informed decisions. Especially when that data is high quality peer reviewed scientific research.

    • Agreed. Although trouble starts when high-quality, peer-reviewed research enters the realm of remix culture and we can imagine some of Mr. Lanier’s more dire predictions coming true. That said, thanks for the suggestions.

      • You say the darnest things sometimes. Scientific research is a “remix culture”. In fact, the popularity of a scientific paper is usually measured by many citations they get in other researcher’s papers. It also tends to have a direct effect on how much future funding that research will get from grant-making authorities.

      • Yes, remix by other scientists, not by everyone with an axe to grind.

      • What is a scientist?

      • You brought up peer review, and I was using the word in that context.

  • a generation of well-educated and financially comfortable young men who ended up with a great deal of time on their hands.

    I think the key phrase there is “financially comfortable”. That is, the young men responsible for the flowering of discovery, ideas, inventions and creative works were in a position where they could devote their time to such activities without worrying about how they were going to pay the bills, put food on the table and keep a roof over their and their families” heads.

    I’m not really sure how open access to supposed to address that. If anything, the type of open access championed by people like Aaron Swartz seems designed to achieve the exact opposite, producing a level of financial precariousness and insecurity for the type of people that tend to be good at things like discovery, ideas, inventions and creative works.

    That is, under the present system, people who are good at discovery, ideas, inventions and creative works can earn a living from it, thus enabling them to pay their bills and support themselves and their families while leaving them free to pursue discoveries, ideas, inventions and creative works. Under something like Swartz’s idea of open access, they couldn’t and would instead have to devote most of their time to securing a living, leaving only snatches of spare time to devote to activities such as discovery, ideas, inventions and creative works — if, indeed, they could muster the energy to devote to such pursuits after a full day’s work.

    If anything, I would suggest the lack of a flowering among the contemporaries of these clergymen, those who went into the professions and who’s financial status depended on their continuing labour, demonstrates that it isn’t access that’s the key. These professionals had the exact same level of access to knowledge and information as the clergymen you describe, they just didn’t have the time or energy to devote to things like discovery, ideas, inventions and creative works.

    The whole open access thing comes across as a very “Let them eat cake” sentiment. It’s advanced by people who are financially comfortable and who, unconsciously perhaps, assume that everyone is as well and so incorrectly conclude that the only reason they’re asked to pay for access is because others are greedy. The notion that the fees are used to pay the bills and support the individuals who create, preserve and maintain that knowledge and information seems to completely escape them. The mistake their level of privilege and opportunity for the norm.

    Something like a universal living wage would go further towards encouraging a flowering of discovery, ideas, inventions and creative works. It would create something like the financial comfort the clergymen you describe enjoyed. And it would advance the cause of open access, since if people didn’t need a separate income to support themselves, the portion of those fees used for that purpose couldn’t be justified any more.

    • I agree that there is an undercurrent of elitism, despite all the populist window-dressing. It is funny that the voices proclaiming to know the path to New New Jerusalem are already standing on the proverbial shining city on the hill (or in the valley in this case). In fact, it would be interesting to compare and contrast someone like Ray Kurzweil with John Winthrop of the Massachusetts Bay settlement and its role in seeding American exceptionalism, but I digress.

      Yes, economic stability that translates into time is central to these pursuits, and I doubt there’s a successful technology company that doesn’t understand this. Google fosters many incubators inside and outside its organization, and their ability to do so is partly supported by the protection of intellectual property. To pursue an agenda that would undermine these same protections for an individual or small organization is anything but populist and could theoretically lead to a consolidation of knowledge just as too much laissez faire economic policy has led to a consolidation of wealth.

  • This Article is of great importance:

    ‘Big Data’ controls what you see on the internet, and gives you an illusion of ‘choice’.

    • David Newhoff

      James_J – Thank you. Excellent article. Right to to the heart of the financial mechanism of Web 2.0 and why its design fosters an illusion of choice. Just taking a macro view on this, consider how much lip service that industry’s spokespeople pay to the word innovation in contrast to how few Web companies dominate in each category. I know I mentioned in a previous post the fact that since the 1990s the Web has been a history of short-lived monopolies. We can talk about competition and level playing fields, but then look at the experience for most users — one search engine, one online encyclopedia, one online retailer, one personal social media site (Facebook), one public social media interface (Twitter), and so on. It would be a good idea for the guy who called me a racist to read this article given the history of predatory marketing practices based on racial profiling.

  • Pingback: Dinosaur Bones for Don – beware science stuff follows. | Gymeagary's Weblog

Join the discussion.