Top 10 data mining mistakes to avoid

Published February 23, 2015   |   
John Elder, PhD

Mining data to extract useful and enduring patterns is a skill arguably more art than science. Pressure enhances the appeal of early apparent results, but it’s too easy to fool yourself. How can you resist the siren songs of the data and maintain an analysis discipline that will lead to robust results? What follows are the most common mistakes made in data mining.
Here, we briefly describe the “Top 10” mistakes of data mining, in terms of frequency and seriousness. After compiling the list, we realized that an even more basic problem—mining without (proper) data—must be addressed as well. So, numbering like a computer scientist (with an overflow problem), here are mistakes 0 to 10.
0. Lack of proper data
1. Focus on training
2. Rely on one technique
3. Ask the wrong question
4. Listen (only) to the data
5. Accept leaks from the future
6. Discount pesky cases
7. Extrapolate
8. Answer every inquiry
9. Sample casually
10. Believe the best mode
To read more about these mistakes, click here. The top 10 data mining mistakes were covered by John Elder, PhD in chapter 20 of the Handbook of Statistical Analysis & Data Mining Applications. Here is the video of his famous talk on the top 10 data mining mistakes.