SAS is still aming the most commonly used tools in the data science industry. While people might have different opinions about its sustainability and features compared to other tools like R and Python – two things are for sure:
Those 2 reasons are good enough to consider SAS, if you are just starting in this industry. You can find more details on how SAS stacks up against other tools here.
A small video to prep you up on what lies ahead:
https://www.youtube.com/watch?v=ksp8CzIgb-E
Download University edition by creating a SAS profile. You will also need to download VMWare or Oracle Virtual box. Here are the links:
Installation notes:
Go through Base SAS training on sas.com . This is a free training and would teach you basics of SAS language in 24 hours.
Assignment / Quiz: Solve the quiz at the end of each section in the course.
Now that you know base SAS to some extent, you should now look at other way of accessing data in SAS – PROC SQL. Read this article to understand how PROC SQL helps: Comparison between Proc SQL and Data Step
If you already know SQL, you would be thanking SAS for creating PROC SQL. Even if you don’t know SQL, you might find it easier to perform day to day data manging jobs on SAS. You can look at this SUGI paper: Introduction to PROC SQL If you need more detailed tutorial, you can look at this tutorial – Introduction to PROC SQL
Let’s start our statistical learning now. It is right time to undergo the course on statistics from Analytics Vidhya. This course would make use of Python to teach you all the basics of descriptive statistics. If you already know them, you can skip this step.
Assignment: The assignments after each chapter in the course should be done on SAS. Your knowledge from the course on Base SAS should be sufficient to complete them. If you need specific help, use SAS documentation.
The above mentioned course also covers inferential stats in Python, including topics like hypothesis testing, t-test and many others. If you already know them, you can skip this step.
Assignment: The assignments after each chapter in the above course should be done in Python or Excel for now. We will re-visit these once we have done the next steps with the course from SAS.
Training from sas.com – Introduction to ANOVA, Regression, and Logistic Regression.
Assignment: Available in the course and from Udacity course
If you are working on SAS University edition, then you will need to skip step 7, 9 and 10. SAS University edition has its own limitations and can not run decision trees and time series modeling.
Now that you know a few algorithms, let us look at decision trees. Here is an awesome article explaining the working of decision trees:
Here is a guide to run Decision trees on Enterprise Miner and here is a paper which implements it in Base SAS
First, look at the first 4 videos in this playlist for introduction to k-means clustering. Next read this guide on clustering from SAS. In addition to this guide, you can also use this chapter as a good reference.
Here is a good introduciton to start learning Time series forecasting and then use this guide to forecasting using Time Series in SAS
Here are a series of articles, which can help you get up to speed with IML:
Below are the series of articles, which can help to understand SAS Macro: