SA Forum is an invited essay from experts on topical issues in science and technology.
It has become fashionable to bad-mouth big data. In recent weeks the New York Times, Financial Times, Wired and other outlets have all run pieces bashing this new technological movement. To be fair, many of the critiques have a point: There has been a lot of hype about big data and it is important not to inflate our expectations about what it can do.
But little of this hype has come from the actual people working with large data sets. Instead, it has come from people who see “big data” as a buzzword and a marketing opportunity—consultants, event organizers and opportunistic academics looking for their 15 minutes of fame.
Most of the recent criticism, however, has been weak and misguided. Naysayers have been attacking straw men, focusing on worst practices, post hoc failures and secondary sources. The common theme has been to a great extent obvious: “Correlation does not imply causation,” and “data has biases.”
Critics of big data have been making three important mistakes:
First, they have misunderstood big data, framing it narrowly as a failed revolution in social science hypothesis testing. In doing so they ignore areas where big data has made substantial progress, such as data-rich Web sites, information visualization and machine learning. If there is one group of big-data practitioners that the critics should worship, they are the big-data engineers building the social media sites where their platitudes spread. Engineering a site rich in data, like Facebook, YouTube, Vimeo or Twitter, is extremely challenging. These sites are possible because of advances made quietly over the past five years, including improvements in database technologies and Web development frameworks.