Perhaps I’m just being reactionary, but the Anderson article makes me cringe.
One fundamental flaw seems to actually be hinted at in the article itself. Anderson is talking about how models fundamental to the physical and biological sciences ultimately proved imperfect, simplistic, etc. as we gained greater understanding of them, as scientists continued to test and refine them (or refute them), and seems to suggest that that is a problem with models in general. But we’re supposed to have faith in a “Big Data” that just takes the data and identifies patterns? Isn’t that a similar simplification of nuanced and diverse behaviors and realities? I get that it’s useful, and opens up tons of avenues of inquiry, and certainly provides new capacities for answering questions we could have only theorized about before, but I don’t see it as completely addressing all those avenues of inquiry, like why there might be exceptions to its big patterns. To take the bacteriologist: he’s identified all these new bacteria, and might be able to make some guesses about other forms of life to which they’re related, or some of their characteristics. But so what? What good does that do? It gives you lots of information, lots more data, but to get anything particularly useful you have to ask the right questions of the data, which presumably relies upon models and theory; to actually do something with whatever you get out of the data requires the same.
I appreciate Caulfield’s intervention here. He notes that scientists etc have premised their work on the assumption that correlation does not mean causation–which seems reasonable enough. I appreciate his point that correlation is enough for some tasks (and I like the term “radical pragmatism” he tosses in there), but it also seems woefully inadequate for others. Take Google translations, which Caulfield mentions as being sets of statistical probabilities, the product of which are generally good enough to move a web page between languages or give a user a general sense of the content of some entered text. That is obviously a useful tool, but it’s also incomplete, likely struggling to convey connotations, rhythms of language, emphasis, the careful construction of sentences and the order of ideas, cognates and puns, and a thousand other nuances of language that do convey meaning. Here I come back to my own scholarly interest in cultural mediators/brokers involved in 17th- and 18th-century European-Indian relations, who were not simply translators, but rather spent inordinate amounts of time learning the structures of speeches and negotiations, the proper moments to employ mnemonic devices and what those objects should be, the histories and cultural logics of a given symbol or title–and when to fudge what was being said so as not to piss off the other participants in the conversation. All of that gets lost in something relying only on sets of statistical probabilities–not always a concern, but often.
Basically it seems that everybody’s favorite models for explaining why causation doesn’t matter and correlation is enough comes down to Amazon and Google–in short, the ability to sell crap. I want to see how it’s useful to the sciences, the social sciences, and I especially want to see how it works with the humanities–I assume there are instances out there I’m just not aware of as yet, and I’d be curious to see them. I’m sure Big Data and its ability to identify correlations can be useful in these other contexts, and will be/is, but I’m not convinced it is so independent of theory and models as Anderson implies.