Our world of Big Data is defined by the quantification of everything. In order to make sense of all the data around us, we need to understand statistics and statistical thinking. Statistics, after all, is the language of data. As I've discussed before, knowledge of statistics is not only needed by data workers (i.e., the rare data scientists) whose job is to dive deep into the data to uncover new insights, but also by data users (i.e., the 99%) who are given the statistical/analytical results from the data scientists. The value of our highly quantified world is limited by the number of people who understand statistics.
Below is a TED talk by mathematics professor Arthur Benjamin, a big proponent of statistics. In the talk, he says that, while calculus is important, "all of our students, every high school graduate should know -- should be statistics: probability and statistics."
You don’t need a PhD in statistics or research methodology to appreciate the value of data or to interpret the data and results you are given. There are different ways you can learn about statistics. Different educators offer online courses on statistics. Udacity offers three free courses: Intro to Statistics, Intro to Descriptive Statistics and Intro to Inferential Statistics. edX offers free statistics courses based on classes taught at UC Berkley. Udemy offers several inexpensive course on statistics. In my upcoming posts, I will cover a variety of topics regarding basic statistics and statistical thinking that will be designed to give the data novice (i.e., the 99%) a basic foundation of statistics and related topics to help them make better use of data and reports. Topics will include measurement scales, decision-making and probabilistic thinking (long run vs. short run), sampling error, biases, spurious correlations, false positives, hypothesis testing and significance testing, to name a few.
By understanding these basic statistical concepts, you will be better equipped to ask important questions about the analytics-derived insights you are given by others. These questions help you think more critically about the conclusions of data-driven reports. These questions include:
- Who sponsored the analytic study? Why was the analysis pursued?
- What were the goals of the analysis? Did you have hypotheses about what you might find before you did the analysis? Were they supported?
- What is the source of the data? Was the data collected for the purposes of this study or was existing data used?
- What is being measured? What are the definitions of each metric?
- What statistical tests were used?
It's necessary to be skeptical of any results you are given. The knowledge of statistics and statistical thinking will help you be critical when evaluating the merits of projects and reports that involve analytics of any sort.
I will be providing links to the articles below.
- Measurement Scales
- Frequencies, Percentages, Histograms and Distributions
- Measures of Central Tendency and Variability
- Sampling Distributions
- Decision Making and Hypothesis Testing
- Significance Testing
- Analysis of Variance
- Regression Analysis
- Factor Analysis: Dimension Reduction
- Spurious Correlations
- False Positives