What are the biggest concerns of data scientists? The State of Data Science 2022 report has the answers

Couldn’t attend Transform 2022? View all summit sessions in our on-demand library now! Watch here.

Data science is a rapidly growing technology as organizations of all sizes embrace AI and ML, and with this growth comes concerns.

The State of Data Science 2022 report, released today by data science platform vendor Anaconda, identifies key trends and concerns for data scientists and the organizations that employ them. Among the trends identified by Anaconda is the fact that the open source Python programming language continues to dominate the data science landscape.

Among the key concerns identified in the report relate to barriers to data science adoption overall.

“One area that surprised me was that two-thirds of respondents felt that the biggest barrier to the successful adoption of data science by the business is insufficient investment in data engineering and tools to enable the production of good models,” said Peter Wang, Anaconda CEO and co-founder, told VentureBeat. “We’ve always known that data science and machine learning can suffer from poor models and inputs, but it was interesting to see our respondents rank it even higher than the talent/headcount gap.”


MetaBeat 2022

MetaBeat will bring together thought leaders to provide guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, California.

Register here

AI bias is far from solving the issue

The issue of AI bias is familiar to data science. What is not as well known is what exactly organizations are doing to combat the issue.

Last year, Anaconda’s State of Data Science 2021 found that 40% of organizations were planning or doing something to help address the issue of bias. Anaconda didn’t ask the same question this year, choosing to take a different approach.

“Rather than asking whether organizations planned to address bias, we wanted to look at the specific steps organizations are now taking to ensure fairness and mitigate bias,” Wang said. “We realized from our findings last year that organizations had plans to address this, so for 2022, we wanted to look at what actions they took, if any, and where their priorities lie.”

As part of efforts to prevent AI bias, 31% of respondents noted that they evaluate data collection methods against internally defined standards for fairness. Conversely, 24% noted that they have no standards for fairness and mitigating bias in datasets and models.

Explaining AI is a fundamental element in helping to identify and prevent bias. When asked what tools are used to explain AI, 35% of respondents noted that their organizations perform a series of controlled tests to assess the interpretability of the model, while 24% have no measures or tools to ensure the explainability of the model model.

“While each response measure has less than 50% of these efforts, the results here tell us that organizations are taking a varied approach to mitigating bias,” Wang said. “Ultimately, organizations are taking action, it’s only early in their journey to address bias.”

How data scientists spend their time

Data scientists have a number of different tasks to do as part of their job.

While actually developing models is the desired end goal, that’s not where data scientists actually spend most of their time. In fact, the study found that data scientists spend only 9% of their time developing models. Similarly, respondents reported spending only 9% of their time on model selection.

The largest amount of time is data preparation and cleaning which accounts for 38% of the time.

The love and fear relationship with open source

The report also asked data scientists about how to use and view open source software.

Eighty-seven percent responded that their organizations allowed open source software. However, despite this use, 54% of respondents noted that they are concerned about open source security.

“Today, open source is embedded in almost every piece of software and technology, and it’s not just because it’s cheaper in the long run,” Wang said. “Innovation happening around artificial intelligence, machine learning and data science is happening in the open source ecosystem at a speed that cannot be compared to a closed system.”

That said, Wang said it’s understandable for organizations to be aware of the risks of open source and develop a plan to mitigate any potential vulnerabilities.

“One of the advantages of open source is that patches and solutions are created openly instead of behind closed doors,” he said.

The Anaconda report was based on a survey of 3,493 respondents from 133 countries.

VentureBeat’s mission is set to be a digital town square for technical decision makers to learn about and transact business-transformative technology. Discover our Updates.

Leave a Comment