Data science projects require data professionals to devote their energy toward different activities toward project completion. Results of a recent study of over 23,000 data professionals found that data scientists spend about 40% of gathering and cleaning data, 20% of their time building and selecting models and 11% of their time finding insights and communicating them to stakesholders.
Data professionals spend their time involved in different activities during a typical data science project. Kaggle recently conducted a ML and DS survey and asked about how data professionals spend their time during data science projects, including:
- Gathering data
- Cleaning data
- Visualizing data
- Model building/model selection
- Putting model into production
- Finding insights and communicating them to stakeholders
Results (see Figure 1) of the survey showed that, of the different data science activities listed, data professionals spend most of their time cleaning data (23%) and the least amount of time putting models into production (9%).
There were some small differences in time spent on various activities across different data roles. For example, data analysts spend about 27% of their time on cleaning data while research scientists spend about 20% of their time on this activity. Additionally, data analysts and data analysts spend about 16% of their time on model building/model selection while software engineers spend about 22% of their time on the model building/model selection.
Interestingly, even though the practice of data science is defined as a way of extracting insights from data, the survey results showed that data professionals only spend about 11% of their time on this activity. In fact, prior research found that data science projects require different types of skills and abilities, including programming expertise, statistics knowledge and subject matter expertise. The current results show that these diverse skill sets reflect the different activities that underlie data science projects.