A new survey of data science tools shows that Python usage is quickly gaining steam among advance analytic professionals, at the expense of both R and SAS. According to the results of the 2016 survey, conducted by Burtch Works, R is the preferred tool for 42% of analytics professionals, followed by SAS at 39% and Python at 20%. While Python’s placing may at first appear to relegate the language to Bronze Medal status, it’s the delta here that really matters.
Here’s the interesting bit: While the first two years of Burtch Works’ survey was focused on the SAS vs. R war, so many analytics professionals chose to write in Python that the company was forced to include the language as a third choice.
There were a few other interesting tidbits from the survey. For instance, among those professionals who identify as a “data scientists,” Python is the tool of choice for 53% of survey respondents, while SAS garnered a tiny 3% share. Among those who identify themselves as “predictive analytics” professionals, SAS and R were in a virtual tie (43% vs. 41%, respectively), while only 16% prefer Python.
There’s a correlation between the education level of the survey respondents and their tool preference, with R usage increasing with the amount of postgraduate education. Interestingly, SAS usage increases with the number of years in the analytics saddle. This makes sense when you consider that analysts who forwent a more advanced degree were more heavily exposed to SAS, which been so dominant in the private sector for so long, while those who stayed in school were more heavily exposed to open source alternatives, like R.
SAS is used more heavily in industries like financial services, healthcare, and retail, while R is favored in the high tech, telecom, and consulting sectors.
While all open source tools continues to gain steam against proprietary setups among data science pros, the big story here is the emergence of Python as a major force on the analytics stage.
Python was first developed in the early 1990s, with roots in C. It’s widely considered to be easier to learn than R, and its status as a general purpose language makes it a relatively simple matter to implement statistical functions into existing applications.
Python isn’t new, but it’s just now appearing to gain steam in the analytics community, at the expense of R and proprietary packages like SAS, IBM‘s SPSS, and Mathworks‘ Matlab.
On the other hand, R also is widely used in the data science community. It’s widely considered to be superior for pure statistical analysis, partially with the availability of packages such as the Comprehensive R Archive Network (CRAN), as well as notebooks like R Markdown and R tools that Microsoft acquired from Revolution Analytics. However, the steep learning curve and lack of applicability outside the statistical community are seen as limiting factors for R.
Continue reading Alex Woodie’s article published on datanami.com