Perhaps I’m just being reactionary, but the Anderson article makes me cringe.

One fundamental flaw seems to actually be hinted at in the article itself. Anderson is talking about how models fundamental to the physical and biological sciences ultimately proved imperfect, simplistic, etc. as we gained greater understanding of them, as scientists continued to test and refine them (or refute them), and seems to suggest that that is a problem with models in general. But we’re supposed to have faith in a “Big Data” that just takes the data and identifies patterns? Isn’t that a similar simplification of nuanced and diverse behaviors and realities? I get that it’s useful, and opens up tons of avenues of inquiry, and certainly provides new capacities for answering questions we could have only theorized about before, but I don’t see it as completely addressing all those avenues of inquiry, like why there might be exceptions to its big patterns. To take the bacteriologist: he’s identified all these new bacteria, and might be able to make some guesses about other forms of life to which they’re related, or some of their characteristics. But so what? What good does that do? It gives you lots of information, lots more data, but to get anything particularly useful you have to ask the right questions of the data, which presumably relies upon models and theory; to actually do something with whatever you get out of the data requires the same.

I appreciate Caulfield’s intervention here. He notes that scientists etc have premised their work on the assumption that correlation does not mean causation–which seems reasonable enough. I appreciate his point that correlation is enough for some tasks (and I like the term “radical pragmatism” he tosses in there), but it also seems woefully inadequate for others. Take Google translations, which Caulfield mentions as being sets of statistical probabilities, the product of which are generally good enough to move a web page between languages or give a user a general sense of the content of some entered text. That is obviously a useful tool, but it’s also incomplete, likely struggling to convey connotations, rhythms of language, emphasis, the careful construction of sentences and the order of ideas, cognates and puns, and a thousand other nuances of language that do convey meaning. Here I come back to my own scholarly interest in cultural mediators/brokers involved in 17th- and 18th-century European-Indian relations, who were not simply translators, but rather spent inordinate amounts of time learning the structures of speeches and negotiations, the proper moments to employ mnemonic devices and what those objects should be, the histories and cultural logics of a given symbol or title–and when to fudge what was being said so as not to piss off the other participants in the conversation. All of that gets lost in something relying only on sets of statistical probabilities–not always a concern, but often.

Basically it seems that everybody’s favorite models for explaining why causation doesn’t matter and correlation is enough comes down to Amazon and Google–in short, the ability to sell crap. I want to see how it’s useful to the sciences, the social sciences, and I especially want to see how it works with the humanities–I assume there are instances out there I’m just not aware of as yet, and I’d be curious to see them. I’m sure Big Data and its ability to identify correlations can be useful in these other contexts, and will be/is, but I’m not convinced it is so independent of theory and models as Anderson implies.

4 Thoughts on “Big Data

  1. Dave Toth on February 23, 2014 at 1:37 pm said:

    I think we have to approach Big Data like we do everything else in life – with moderation. It’s not the be-all to end-all, but it can be really useful. The ability to process huge amounts of data and find the patterns can be used to “sell crap” as you point out (which is not entirely bad, as it likely helps the economy and has opened up new high tech white collar jobs, which can replace manufacturing jobs that we have lost for future generations), but it can also be used to great benefit in science. Imagine being able to use those Big Data techniques and the Big Data compute infrastructure to identify a drug that might cure AIDS or some other awful disease. Right now, it’s a buzzword, and many businesses will look to exploit the techniques to make money, but some scientists will try to use this to make real scientific breakthroughs, and that could be fantastic!

  2. Agreed, Dave, Big Data offers some awesome potential, and I would have to think that’s especially true in terms of its applications in the sciences (I have a harder time imagining it in the humanities, but I have no doubt somebody will come up with something brilliant on that front as well). I think in part my negative reaction can be attributed to my impression–which may be incorrect–that so much today seems driven by business models, and the idea that they change everything and replace everything and we don’t need anything else and nothing else is economically productive, an impulse that ignores if not outright dismisses other considerations (and especially moderation and the potential to combine models/theories derived from different realms of thinking). Thus my concession that I’m possibly just being reactionary. Thanks for your thoughts, and teasing out an upside that I didn’t do much more than acknowledge in my original post.

    • Dave Toth on February 23, 2014 at 3:31 pm said:

      I think you are right to be a little cynical about this. You are absolutely correct that so much is driven by business today. We saw this happen when I was in grad school (’02-’08) with another buzzword. It was “grid computing.” There was a lot of interest in that from businesses, and what was thought up by academics and scientists at some US national laboratories was twisted into a business model. Although there are a few people and institutions (particularly one in Europe) using it to advance science as I believe it was originally envisioned to do, the prime use today is Amazon’s cloud computing service, where they sell you compute power on demand, and now it is not really viable for science the way it was envisioned, because you have to pay for it. 🙁

  3. Will Mackintosh on February 23, 2014 at 8:31 pm said:

    I’m with you on your skepticism, Jason. It seems to me that Big Data in the humanities leads to counting words in Shakespeare and this kind of thing:

Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation