IBM is hosting their IBM Insight event next week in Las Vegas. The event will, no doubt, help companies understand how to use analytics to get insight from their data. I will be attending the event as their guest and will be sharing what I learn via social media. As a first step in sharing my knowledge about how to get insight from data, IBM asked me what I thought about the role of the data scientist in the Insight Economy. Generally speaking, I think that the role of data scientists is to extract value from data. Data scientists' work helps improve how humans make decisions and how algorithms optimize outcomes. Through the collection, analysis and interpretation of data, data scientists extract empirically-based insights that augment and enhance how humans and algorithms work. Let me take a closer look how how businesses can get insight from their data.
Generating Insights: The Scientific Method
Scientists have been getting insight from data for centuries using the scientific method. Formally defined, the scientific method is a body of techniques for objectively investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. The scientific method includes the collection of empirical evidence, subject to specific principles of reasoning. The scientific method follows these general steps:
- Formulate a question or problem statement
- Generate a hypothesis that is testable
- Gather/Generate data to understand the phenomenon in question. Data can be generated through experimentation; when we can't conduct true experiments, data are obtained through observations and measurements.
- Analyze data to test the hypotheses / Draw conclusions
- Communicate results to interested parties or take action (e.g., change processes) based on the conclusions. Additionally, the outcome of the scientific method can help us refine our hypotheses for further testing.
Despite the idea that Big Data will kill the need for theory and the scientific method, the human element is necessarily involved in the generation, collection and interpretation of data. Consider the overestimation that Google Flu Trends made regarding flu rates in 2012. More data will not magically give you better answers. The application of the scientific method helps us be honest with ourselves and minimizes the chances of us arriving at the wrong conclusion. The scientific method plays a critical role in understanding any data, irrespective of their size or speed or variety.
As Carl Sagan said, "Science is a way of thinking much more than it is a body of knowledge." The scientific method is a way to help us understand how the world really works. To be of real, long-term value to business, analytics needs to be about understanding the causal links among the variables. Through trial and error, the scientific method helps shed light on identifying the reasons why variables are related to each other and the underlying processes that drive the observed relationships.
The Structure of Data Science Skills
We recently conducted a survey of over 500+ data professionals regarding data science skills. We asked these data professionals to rate their proficiency across 25 skills conventionally thought to be important in the field of data science (see Figure 2). While these 25 skills were logically divided into five distinct categories (Business, Technology, Programming, Math & Modeling and Statistics), a factor analysis of the 25 skills resulted in a clear three-factor solution, suggesting that the 25 skills can be grouped into three broad skill areas: 1) Business, 2) Technology / Programming and 3) Statistics / Math.
When I map the three data science skills against the five steps of the scientific method, it's clear why data science skills are so important in extracting insight from data. As you can see in Figure 3, proficiency in each of the three data science skills is required to successfully implement the scientific method as a way to get insights from data. Business knowledge is necessary to help formulate the right questions, generate hypotheses, gather data and communicate results. Technology/Programming skills are needed to gather/generate data and analyze data/test hypotheses. Finally, Statistics/Math skills are necessary to gather data, analyze data/test hypotheses and communicate results.
No single data professional is an expert in all skill areas. Because data professionals tend to be very competent in one skill area, it's imperative that data science teams have different types of data scientists, each addressing specific steps of the scientific method. We looked at four different types of data scientists: 1) business management, 2) developer, 3) creative and 4) researcher and found that each type of data scientist has strengths in different skill areas (see Figure 4).
In our Big Data world, businesses have access to a lot of data. They can extract useful insight from their data by following the scientific method. To do so, they need to employ different types of data professionals who have complementary skills that address each of the five steps of the scientific method: 1) formulate the problem, 2) generate hypotheses, 3) gather data, 4) analyze data and 5) communicate results. Organizations that adopt the scientific method will be better able to make sense of their data to help them understand how their business really works.