Note: This article was originally published on Mar 27th, 2014 and updated on Sept 12th, 2017
We love comparisons!
From Samsung vs. Apple vs. HTC in smartphones; iOS vs. Android vs. Windows in mobile OS to comparing candidates for upcoming elections or selecting captain for the world cup team, comparisons and discussions enrich us in our life. If you love discussions, all you need to do is pop up a relevant question in middle of a passionate community and then watch it explode! The beauty of the process is that everyone in the room walks away as a more knowledgeable person.
I am sparking something similar here. SAS vs. R has probably been the biggest debate the data science industry might have witnessed. Python is one of the fastest growing languages now and has come a long way since it’s inception. The reason for me to start this discussion is not to watch it explode (that would be fun as well though). I know that we all will benefit from the discussion.
This has also been one of the most commonly asked questions to me on this blog. So, I thought I’ll discuss it with all my readers and visitors!
Probably yes! But I still feel the need for discussion for following reasons:
So, without any further delay, let the combat begin!
Here is a brief description about the 3 ecosystems:
I’ll compare these languages on following attributes:
I am comparing these from point of view of an analyst. So, if you are looking for purchasing a tool for your company, you may not get complete answer here. The information below will still be useful. For each attribute I give a score to each of these 3 languages (1 – Low; 5 – High).
The weightage for these parameters will vary depending on what point of career you are in and your ambitions.
SAS is a commercial software. It is expensive and still beyond reach for most of the professionals (in individual capacity). However, it holds the highest market share in Private Organizations. So, until and unless you are in an Organization which has invested in SAS, it might be difficult to access one. Although, SAS has brought in a University Edition that is free to access but it has some limitations. You can also use Jupyter Notebooks in there!
R & Python, on the other hand are completely free. Here are my scores on this parameter:
SAS – 3
R – 5
Python – 5
SAS is easy to learn and provides easy option (PROC SQL) for people who already know SQL. Even otherwise, it has a good stable GUI interface in its repository. In terms of resources, there are tutorials available on websites of various university and SAS has a comprehensive documentation. There are certifications from SAS training institutes, but they again come at a cost.
R has the steepest learning curve among the 3 languages listed here. It requires you to learn and understand coding. R is a low level programming language and hence simple procedures can take longer codes.
Python is known for its simplicity in programming world. This remains true for data analysis as well. While there are no widespread GUI interfaces as of now, I am hoping Python notebooks will become more and more mainstream. They provide awesome features for documentation and sharing.
SAS – 4.5
R – 2.5
Python – 3.5
This used to be an advantage for SAS till some time back. R computes every thing in memory (RAM) and hence the computations were limited by the amount of RAM on 32 bit machines. This is no longer the case. All three languages have good data handling capabilities and options for parallel computations. This I feel is no longer a big differentiation. They’ve all also brought on Hadoop and Spark integrations, with them also supporting Cloudera and Apache Pig.
SAS – 4
R – 4
Python – 4
SAS has decent functional graphical capabilities. However, it is just functional. Any customization on plots are difficult and requires you to understand intricacies of SAS Graph package.
R has highly advanced graphical capabilities along with Python. There are numerous packages which provide you advanced graphical capabilities.
With the introduction of Plotly in both the languages now and with Python having Seaborn, making custom plots has never been easier.
SAS – 3
R – 4.5
Python – 4.5
All 3 ecosystems have all the basic and most needed functions available. This feature only matters if you are working on latest technologies and algorithms.
Due to their open nature, R & Python get latest features quickly. SAS, on the other hand updates its capabilities in new version roll-outs. Since R has been used widely in academics in past, development of new techniques is fast.
Having said this, SAS releases updates in controlled environment, hence they are well tested. R & Python on the other hand, have open contribution and there are chances of errors in latest developments.
SAS – 4
R – 4.5
Python – 4.5
Globally, SAS is still the market leader in available corporate jobs. Most of the big organizations still work on SAS. R / Python, on the other hand are better options for start-ups and companies looking for cost efficiency. Also, number of jobs on R / Python have been reported to increase over last few years. Here is a trend widely published on internet, which shows the trend for R and SAS jobs. Python jobs for data analysis will have similar or higher trend as R jobs:
The graph below shows R in Blue and SAS in Orange.
Overall, the market based on languages can be pictured as such:
SAS – 4
R – 4.5
Python – 4.5
R and Python have the biggest online communities but no customer service support. So if you have trouble, you are on your own. You will get a lot of help though.
SAS on the other hand has dedicated customer service along with the community. So, if you have problems in installation or any other technical challenges, you can reach out to them.
SAS – 4
R – 3.5
Python – 3.5
Deep Learning in SAS is still in it’s beginning phase and there’s a lot to work on it.
On the other hand, Python has had great advancements in the field and has numerous packages like Tensorflow and Keras.
R has recently added support for those packages, along with some basic ones too. The kerasR and keras packages in R act as an interface to the original Python package, Keras.
SAS – 2
Python – 4.5
R – 3
Following are some more points worthy to note:
We see the market slightly bending towards Python in today’s scenario. It will be pre-mature to place bets on what will prevail, given the dynamic nature of industry. Depending on your circumstances (career stage, financials etc.) you can add your own weights and come up with what might be suitable for you. Here are a few specific scenarios:
Strategically, corporate setups that require more hands-on assistance and training choose SAS as an option.
Researchers and statisticians choose R as an alternative because it helps in heavy calculations. As they say, R was meant to get the job done and not to ease your computer.
Python has been the obvious choice for startups today due to its lightweight nature and growing community. It is the best choice for deep learning as well.
Here is the final scorecard:
These are my views on this comparison. Now, it’s your turn to share your views through the comments below.
I can say by experience that R is a lot more fun than SAS. Exploring all the packages and conversations on Stackoverflow . . .
I felt python scikit has better documentation compared to R. Also for handling unstructured data python seems better. Any other views if you have please share.
Thanks for this.
I agree
Absolutely True! R is simpler to learn than SAS in my view.
Good one Kunal! Liked it.
I liked each and every article of yours..................... thanks a lot......