This is the eighth instalment in the series of essays on Statistics Denial by Randy Bartlett, Ph.D. To read other articles in the series, click here.
“The only useful function for a statistician is to make predictions, and thus provide a basis for action.”
— W. Edwards Deming
Myth #4: Prediction is not part of statistics
This blog discusses the ambition of rebranding predictive modeling as predictive analytics; and why this will enlarge the coming flood of statistical malfeasance. The prediction problem involves uncertainty and statistics provides us with the tools, language, and thinking for addressing numbers with uncertainty. We have provided a problem-based clarification of statistics in Analytics Magazine http://goo.gl/Wod3gk. This should help people better identify statistics problems.
Rebranding & Mischaracterizing Predictive Modeling:
The rebranding of predictive modeling as predictive analytics is on its surface no more harmful than selling ‘pre-owned cars’ instead of ‘used cars.’ The harm comes when rebranding mischaracterizes statistics and circumvents best practice. As mentioned before, the concern with rebranding is that the next step is to strip away everything not understood by those merely following recipes.
Here is a quote that captures the problematic mischaracterization, ‘PREDICTIVE ANALYTICS does NOT require an understanding of “STATISTICS / TRADITIONAL p-value STATISTICS” …. Period !!!!!’ This objection spreads rudimentary misunderstandings about statistics. Here is another quote, ‘It [predictive modeling] is not a [sub]field of statistics.’
First, prediction has always been a subfield of statistics. We need to look no further than the fact that prediction involves uncertainty. Let us recap the four common objectives for statistics models: coefficient estimation, prediction (there it is!), grouping, and ranking. Those new to the subject want to rename and reclassify everything as part of their rediscovery.
Second, note the qualification from the first quote, ‘statistics/traditional p-value statistics.’ This is like claiming that division does not require an understanding of ‘mathematics/traditional addition.’
Here is a third quote, “PA [Predictive Analytics] and DS [Data Science] both contrast with statistics in their emphasis on prediction over causality and their general use of observational in contrast to experimental methods.” PA and DS are rebrandings of predictive modeling and statistics, respectively with no change in content. All of the assumptions, thinking, and tools for dealing with uncertainty are statistical.
First, that ‘predictive analytics emphasizes prediction over statistics’ is just babble. Similarly, we could claim that predictive modeling emphasizes prediction over statistics and sampling emphasizes sampling over statistics too. We could claim that ‘Division Scientists’ perform more division than Mathematicians. This does not express a new value proposition for predictive analytics. In general, this boils down to a comparison between topical areas like commercial statistics and clinical statistics. This distinction is lost when an applied statistician moves from clinical to commercial.
Second, the same confused type of claim is repeated with the idea that predictive analytics is more about observational data than statistics. Again, this is like claiming that predictive modeling is more about observational data than statistics. There is nothing new in this rebranding that is not in predictive modeling—a subfield of statistics.
Third, statistics places a heavy emphasis on analyzing observational data and it does this in a number of subfields: predictive modeling, DoS (Design of Samples), QC/PC (Quality Control/Process Control), Times Series, EDA (Exploratory Data Analysis), et al. Hence, statistics is more about observational data than predictive analytics. Observational data contains uncertainty.
Predictive analytics is a rebranding of predictive modeling for promotional purposes. We can be certain that prediction is a statistics problem because it involves numbers with uncertainty. Claiming prediction does not require an understanding of statistics is like claiming that division does not require an understanding of mathematics.
Rebranding can have some benefits if performed thoughtfully. However, we think that there is nothing thoughtful or measured in denying the value proposition of statistics. The downside of rebranding is that important parts can be omitted just because they are not understood by recipe followers. We have noticed that self-professed experts in predictive analytics seldom discuss prediction intervals!? Corrupting prediction modeling will facilitate a flood of statistical malfeasance.
We sure could use Deming, right now. Many of us, who consume or produce data analysis, hang out in the new LinkedIn group: About Data Analysis. Come see us.