Last week, I identified the top skills across different data science professionals. The results of our survey of 620+ data professionals showed that, while data scientists possess many different skills (we looked at 25 skills), some skills are more popular/common to data scientists compared to other skills. For example, the top two skills were communication and managing structured data while the bottom two skills were big and distributed data and cloud management. Just because a skill is popular (unpopular) among data professionals, however, does not necessarily mean it's important (unimportant) for project success. In this week's post, I will examine the degree to which data science skills are related to satisfaction with the outcomes of analytics projects. Are some data science skills more closely associated with successful project outcomes?
Frequency vs. Importance of Data Science Skills
A common method of ranking data science skills is based on the frequency with which professionals possess the skills. Skills that are held by most data scientists are deemed the "top" data science skills. I did this last week. Another way to rank data science skills is based on their importance to the success of projects. To rank data science skills on their importance, we need to clarify what we mean by "importance." Determining the importance of a data science skill is an exercise in quantifying the statistical relationship between skills and the quality of the outcome of analytics projects. Here is how we determined the importance of data science skills.
- Don't know (0)
- Fundamental Knowledge (20)
- Novice (40)
- Intermediate (60)
- Advanced (80)
- Expert (100)
Additionally, in the study, we ask data professionals to indicate their their satisfaction with the outcomes of analytics projects on which they work. This rating is on a scale from 0 (Extremely Dissatisfied) to 10 (Extremely Satisfied). I am using this score as a measure of project success; higher satisfaction ratings indicate better outcomes of projects.
Here is how we determined "importance" of data science skills. For each of the 25 data science skills, I correlated proficiency ratings of each skill with the satisfaction rating. Each correlation shows us the degree of relationship between a specific skill and the satisfaction with the outcome of analytics projects. Skills that show a high correlation with satisfaction with outcomes are more important to project success compared to skills with lower correlations.
The 10 Most Important Data Science Skills
I ranked the 25 data science skills according to the magnitude of their correlation with satisfaction with project outcome. This list appears in Figure 2. The first 10 skills listed in the figure (from left to right) were the skills most closely associated with good project outcomes. The 10 most important data science skills to project success were:
- S - Data Mining and Viz Tools (corr with satisfaction = .44)
- S - Statistics and statistical modeling (.39)
- T - Machine Learning (.38)
- S - Science/Scientific Method (.38)
- M - Algorithms and Simulations (.37)
- M - Bayesian Statistics (.37)
- M - Optimization (.33)
- S - Data Management (.33)
- T - NLP and text mining (.32)
- M - Math (.31)
Many of the important data science skills are highly quantitative in nature; in fact, 8 of the 10 skills include Math and Statistics skills, including Data Mining and Viz Tools, Statistics and statistical modeling, Science/Scientific Method and Algorithms and Simulations.
Differences across Data Science Job Roles
The correlations between data skills proficiency and satisfaction with project outcome appear in Table 1. On average, we see that proficiency in data skills is more closely linked to satisfaction with work outcomes for Business Managers (average r = .29) and Researchers (average r = .30) compared to Developers (average r = .18) and Creatives (average r = .18). That is, higher levels of proficiency in data science skills leads to much better project outcomes for Business Managers and Researchers compared to Developers and Creatives.
Next, I looked at the importance of data science skills by job role. This depiction also appears in Figure 2 (and Table 1 in detail). For each of the job roles, I graphically indicated the importance of data science skills in driving satisfaction with project outcomes. As you can see in Table 1 and Figure 2, of the first 10 data science skills, only one data science skill is important to project success across all four job roles: Data Mining and Viz Tools. Irrespective of your job role as a data scientist, a greater proficiency in using data mining and visualization tools will likely lead to higher satisfaction with project outcomes.
The importance of data science skills varies significantly by job role. The rankings of the importance of data science skills across the four job roles are widely different. The average correlations between rankings of data skills across the four job roles is r = -.05. This finding suggests that data skills that are essential to project outcomes for one type of data scientist are vastly different than skills for other types of data scientists. For example, while the skill in statistics and statistical modeling is the top driver of success for Researchers (r = .45), that same skill is not that important for Developers (r = .15). Additionally, while the skill in product design is the top driver of project success for Developers (r = .36), it is not that important for Creatives (r = .07). Let's take a look at the 10 most important data science skills for each job role.
Business Manager: The 10 data science skills that best predict project success for Business Managers (i.e., leader, business person, entrepreneur) are:
- S - Statistics and statistical modeling (corr with satisfaction = .44)
- S - Data Mining and Viz Tools (.43)
- T - Machine Learning (.39)
- T - Big and distributed data (.39)
- S - Science/Scientific Method (.38)
- M - Bayesian Statistics (.35)
- M - Optimization (.33)
- T - Managing Structured data (.32)
- P - Systems Administration (.31)
- T - Managing Unstructured data (.30)
It's interesting to note that skills that drive Business Managers' satisfaction with project outcomes are, surprisingly, not business-related. The top drivers of satisfaction, instead, reflect skills in Technology, Statistics and Math. Four of the five business skills had the weakest correlation with satisfaction.
Developers: The 10 data science skills that best predict project success for Developers (i.e., developers or engineers) are:
- B - Product design and development (corr with satisfaction = .36)
- P - Systems Administration (.34)
- P - Back-end Programming (.34)
- S - Data Mining and Viz Tools (.31)
- P - Front-end Programming (.28)
- B - Governance and Compliance (.25)
- S - Data Management (.24)
- S - Communication (.23)
- P - Database Administration (.20)
- T - NLP and text mining (.20)
Data science skills that are unique to Developers that drive their satisfaction include Back-end programming, Front-end programming, Governance and Compliance and Communication.
Creatives: The 10 data science skills that best predict project success for Creatives (i.e., Jack of all trades, artist, hacker) are:
- M - Math (corr with satisfaction = .51)
- S - Data Mining and Viz Tools (.39)
- B - Business development (.34)
- M - Graphical Models (.32)
- M - Optimization (.31)
- T - Managing Structured data (.31)
- P - Database Administration (.28)
- M - Algorithms and Simulations (.23)
- T - Machine Learning (.22)
- M - Bayesian Statistics (.21)
There are two data science skills that are unique to Creatives that drive their satisfaction. These include Math and Graphical Models.
Researcher: The 10 data science skills that best predict project success for Researchers (i.e., researcher, scientist, statistician) are:
- S - Statistics and statistical modeling (corr with satisfaction = .45)
- S - Data Mining and Viz Tools (.45)
- M - Algorithms and Simulations (.41)
- B - Product design and development (.41)
- T - Big and distributed data (.40)
- S - Data Management (.40)
- T - Machine Learning (.39)
- M - Bayesian Statistics (.35)
- B - Business development (.30)
- T - NLP and text mining (.30)
Unlike Business Managers, Researchers, to be happy about the outcome of their work, require skills in a couple areas related to business: product design and development and business development. These sorts of business skills can help Researchers put their analysis in context of what is important to business, ensuring their measures and analyses are aligned with business goals.
Difference Between Top Skills and Most Important Skills
We can rank data science skills along two dimensions: 1) number of data professionals who possess the skill and 2) how important a skill is in driving project success. These two different approaches tell you different things about the data science skills. The former tells you which data skills are most popular among data professionals. The latter tells you which skills are most important to driving satisfaction with project outcomes.
When we compare the two approaches, we see very different things. A skill could be common among data professionals but weakly related to project success. We found that the most popular (top) data science skills include (data science skills that are unique to each list appear in bold font): Communication, Managing structured data, Math, Project management, Data Mining and Viz Tools, Science/Scientific Method, Data Management, Product design & development, Statistics and statistical modeling and Business development. The most important data science skills include: Data Mining and Viz Tools, Statistics and statistical modeling, Machine Learning, Science/Scientific Method, Algorithms and Simulations, Bayesian Statistics, Optimization, Data Management, NLP and text mining and Math.
So, while Communication was the top skill (most common) among data scientists, that skill was not among the top 10 drivers of project success. The point here is that, in addition to evaluating data science skills based on their popularity, we need to be evaluating data science skills based on their impact on project performance.
Summary and Conclusions
Data science skills are necessary for successful analytics projects. The current analysis identified the 10 data science skills that are the most important to the quality of the outcome of analytics projects. Skills in statistics and math dominated the top 10 drivers of project outcomes. In fact, the most important data science skill to project outcome was Data Mining and Visualization Tools; if you are a data scientist, it doesn't matter if you are a Developer or Researcher or any other any other kind, it would benefit you to learn tools that help you mine and visualize data. The more proficient in these tools you become, the better you will feel about the outcome of your analytics projects.
Different job roles require different data science skills for success. For Business Managers, they need to be savvy in statistics, machine learning and big and distributed data. For Developers, their skills need to include product design and development, systems administration and back-end programming. For Creatives, their skills need to include math, business development and graphical models. For Researchers, they need to possess skills in statistics, algorithms and simulations and product design and development.
An interesting finding was that Communication (e.g., sharing results, writing/publishing, presentations, blogging) was not as important as many other data skills in driving project success. Of the four job roles studied here, Communication showed a very weak relationship with satisfaction with project outcomes (average r = .20.) (see Table 1 and Figure 2).
These results of the current analysis should not minimize the importance some data science skills over others. The importance of data science skills likely varies by the requirements of the project. Skills can not be important unless they are required by the job parameters. Still, many of these data science skills still showed a statistically significant relationship with project outcome. Specific data science skills are more highly related to project outcomes compared to other data science skills. Proficiency in these top data science skills can help you dramatically improve the likelihood that your analytics projects succeed.