Essays on Statistics Denial: Where do these myths come from?

Analytics   |   
Published June 17, 2015   |   

UFO sightings, where do they all come from? Astronomers spend an inordinate amount of time scouring the skies. However, as far as we know, no credentialed astronomer has ever reported a UFO. None, zilch.
Statistics denial myths, where do they all come from? We have always had them, owing to an environment deeply rooted in deterministic thinking (no room for uncertainty) and low statistics literacy. Most of us are uncomfortable with uncertainty and lack the training to deal with it. Also, statistics denial comes in waves as areas of application discover and rediscover the potential of data insights.
The current wave of denying and mischaracterizing the role, breadth, and depth of applied statistics is associated with promotional hype around today’s Big Data ‘information rush.’ During the gold rush of 1849, it was the merchants, who made most of the money … and none of them knew anything about mining for gold. Among the most successful was Samuel Brannan, a tireless self-promoter, shopkeeper, and newspaper publisher, who purchased all the prospecting supplies available in San Francisco and re-sold them at a substantial profit.
Today’s information rush is exemplified by the great promise of overflowing observational data, hyper communications, and the approaching Internet of Things. The promotional hype intially comes from journals, self-glorifying books, and vendors, all with a certain perspective that is not informed by practice experience—publishers are unable to discern qualifications. This creates misinformation stampedes with energized statistics deniers writing amplifying blogs, presentation decks, et al., which further mischaracterize and even adulterate statistics. The downstream echos talk everyone into believing their own hyped fabrications. Two of the problems are that 1. Selling good statistics practice can be less lucrative than cutting some serious corners; and 2. Promoting services, workshops, data-analysis results, etc. is easier when not encombered by competently weilding and accurately depicting statistics.
To harness the data, we need to follow best practices (Best Statistical Practice) for extracting and leveraging information—as characterized by Deming, et al. (See ‘Out Of The Crisis’). We provide a complementary and more riveting, problem-based definition of statistics in the May/June 2015 issue of Analytics Magazine, Even if many of us adhere to best practice, we need to be prepared for a coming flood of statistical malfeasance. We should expect more follies (Google Flu Trends, fiber and colon cancer, Potti Gate, et al.) and more financial debacles (AIG, Fannie Mae, Moody’s, Fitch Ratings, S&P Ratings, et al.).
It is our ambition to disrupt the ‘message repetition,’ which aims to establish these myths in the grand tradition of mass delusions and hysterias: Y2K, the dot-com new world order of 2000, Black-Scholes (won a Noble), the housing bubble, witch burning, and UFO sightings. We want you, gentle reader, to be wary of experts chosen by the loudness of their voices and of stories about complex technology expressed in two sentences. We trust that people are smart; if they can find the information, they will not let stories about little green men get in their way of making a little green.
This series of blogs will outline the minefield of statistics denial obstructing fact-based decision makers and disrupting those who manage and analyze the data:
Blog 2: Statistics Denial, Statistical Debacles & The Coming Flood Of Statistical Malfeasance
Blog 3: Statistics Denial, Applied Statistics Is A Way Of Thinking, Not Just A Toolbox
Blog 4: Statistics Denial, Best Statistical Practice
Blog 5: Statistics Denial Myth #1, Traditional Techniques Straw Man
#1: Statistics consists of only certain ‘traditional’ techniques or those from STAT 201
Blog 6: Statistics Denial Myth #2, Traditional Practice Straw Man
#2: Statisticians (and other quants), work within ‘traditional statistics’
Blog 7: Statistics Denial Myth #3, Repackaging Statistics With Straddling Terms
#3: Data mining, machine learning, Big data analysis, business analytics, and data science are distinct from statistics
False Implication: We can solve statistics problems without using statistics thinking, tools, and assumptions because we have these other disciplines
Blog 8: Statistics Denial Myth #4, Rebranding Predictive Modeling
#4: Prediction is not part of statistics
Blog 9: Statistics Denial Myths #5-6, Mischaracterizing Statistical Significance
#5: For a large number of observations (Big Data: Volume), all the variables are significant so statistics does not work
#6: Statistics does not accommodate ‘consequential’ statistical significance
Blog 10: Statistics Denial Myths #7-9, About Big Data
#7: Big Data Volume (Or Large N) contains complete information
#8: Big Data Volume (Or Large N) speaks for itself
#9: Big Data Volume (Or Large N) replaces sampling and other statistics—so much information
Blog 11: Statistics Denial Myths #10-11, Analyzing Big Data
#10: (Statistical) sampling does not work for Big Data
#11: Current statistics techniques will not work for Big Data
Blog 12: Statistics Denial Myth #12, Publications Straw Man
#12: Statistics is defined by the recent publishing activities of statistics professors
False Implication: Statistics is limited to some toolbox defined by academics
Blog 13: Statistics Denial Myths #13-14, About Statisticians
#13: Statisticians are homogeneous
#14: Academic statisticians are typical of and can speak for the whole profession
The ‘information rush’ is producing a sense of urgency; a great deal of opportunity; and spectacular breakthroughs coming from everywhere. Meanwhile, the combination of low statistics literacy and overzealous promotional hype is facilitating dysfunctional data analysis, which is more detrimental than UFO sightings. Mischaracterizations of statistics are a problem when they adulterate statistics or obstruct best practice (Best Statistical Practice). In the coming blogs, we will expose the harmful aspects of these mischaracterizations.
In our next blog, we will discuss the harm, which is far greater than crop circles. The third and fourth blogs will articulate the value proposition of applied statistics and best practice. In the remaining blogs, we will debunk these myths (UFO sightings) as just weather balloons, crop circles, swamp gas, et al. We will use the terms quant and applied statistician interchangeably and to denote those qualified to perform advanced data analysis.
We sure could use Deming, right now. Many of us, who consume or produce data analysis, hang out in the new LinkedIn group: About Data Analysis.