A recent survey of over 16,000 data professionals showed that the most common challenges to data science included dirty data (36%), lack of data science talent (30%) and lack of management support (27%). Also, data professionals reported experiencing around three challenges in the previous year. A principal component analysis of the 20 challenges studied showed that challenges can be grouped into five categories.
Data science is about finding useful insights and putting them to use. Data science, however, doesn’t occur in a vacuum. When pursuing their analytics goals, data professionals can be confronted by different types of challenges that hinder their progress. This post examines what types of challenges experienced by data professionals. To study this problem, I used data from the Kaggle 2017 State of Data Science and Machine Learning survey of over 16,000 data professionals (survey data collected in August 2017).
Barriers and Challenges at Work
The survey asked respondents, “At work, which barriers or challenges have you faced this past year? (Select all that apply).” Results appear in Figure 1 and show that the top 10 challenges were:
- Dirty data (36% reported)
- Lack of data science talent (30%)
- Company politics (27%)
- Lack of clear question (22%)
- Data inaccessible (22%)
- Results not used by decision makers (18%)
- Explaining data science to others (16%)
- Privacy issues (14%)
- Lack of domain expertise (14%)
- Organization small and cannot afford data science team (13%)
Results revealed that, on average, data professionals reported experiencing three (median) challenges in the previous year. The number of challenges experienced varied significantly across job title. Data professionals who self-identified as a Data Scientist or Predictive Modeler reported using four platforms. Data pros who self-identified as a Programmer reported only one challenge.
Groupings of Platforms and Resources
I conducted a principal component analysis of the 20 challenges (0 = not experience; 1 = experienced) to identify naturally occurring challenge groupings. I found a fairly clear 5-component solution, showing that specific challenges tend to occur with other challenges.
The five components (challenge groupings) are (see Figure 2):
- Insights not Used in Decision Making: These challenges include company politics, an inability to integrate study findings into decision-making processes and lack of management support.
- Data Privacy, Veracity, Unavailability: These challenges revolved around the data itself, including how “dirty” it is, its availability as well as privacy issues.
- Limitations of tools to scale / deploy: Challenges in this category are related to the tools that are used to extract insights, deploy models as well as scaling solutions up to the full database.
- Lack of Funds: Challenges around lack of funding impact what the organization can purchase with respect to external data sources, data science talent and, perhaps, domain expertise.
- Wrong Questions Asked: Challenges are about the difficulty in maintaining expectations about the impact of data science projects and not having a clear question to answer or a clear direction to go in with the available data.
Summary
Data professionals experience challenges in their data science and machine learning pursuits. Data professionals experience about three (3) challenges in a year. The most common data science and machine learning challenges included dirty data, lack of data science talent, lack of management support and lack of clear direction/question.
Who are those magical 64% of data workers who have not experienced “dirty data”?!?