Efficiency in coding differentiates a good coder from a bad coder. While you don’t need to be an awesome coder necessarily to be a good analyst, being a good coder always gives you that extra edge. In this article, I am explaining a few simple tricks with FIRST. & LAST. variables in SAS to help you become better in processing data.
The SET and BY statements in a data step tell SAS to process the data by grouping observations together. Whenever we use BY statement with a SET statement, SAS automatically creates two temporary variables for each variable name that appears in the BY statement. One of the temporary variables is called FIRST.variable, where variable is the variable name appearing in the BY statement. The other temporary variable is called LAST.variable. The two variables always equals either 1 or 0 when:
SAS uses the value of the FIRST.variable and LAST.variable to identify the first and last observations in a group. SAS places FIRST.variable and LAST.variable in the Program Data Vector (PDV). Then, they become available for DATA step processing but SAS does not add them to the output data set as they are temporary in nature.
Let’s look at some of the many uses of FIRST.variable and LAST.variable in SAS using the business problem below:
We have got a data set as shown below from a mobile network provider. It contains the following information:
We need to perform the following exercises:
Note: These problems form a very important part of the exploratory phase of the project and can provide key insights about your customers.
Can you think of ways to solve this problem? How easy or difficult does it look? Let’s solve them and see!
Answer: Since this dataset is small, we can easily make out ‘Henry’ and ‘Steve’ have only one record each in this dataset. Let’s try to find this answer using the concept of FIRST. and LAST. We’ll sort the data by Cust_ID then FIRST.Cust_ID and LAST.Cust_ID and make sure that these two records must be equal to one.
Lets verify this, by running the code below and observing the output:
Code:
proc sort data = MinutesUsage; by Cust_ID; run;
data single; set MinutesUsage; by Cust_ID; /* If the condition is satisfied we store the output in a dataset called single*/ if first.Cust_ID= 1 and last.Cust_ID = 1 then output single; run;
Output:
Answer: To solve this problem, I have first sorted the data by Cust_ID and then by Circle. Let’s see how the variables first. and last. are created for each BY variable:
Code:
proc sort data = MinutesUsage; by Cust_ID Circle; run; data details; set MinutesUsage; by Cust_ID Circle; /* For each customer,the first record will have first.Cust_ID = 1*/ if first.Cust_ID = 1 then first_id = 1;else first_id = 0; /* For each Customer,the first occurrence of a particular circle will have first.Circle = 1*/ if first.Circle = 1 then first_circle = 1;else first_circle = 0; /* For each customer the last occurrence will have last.Cust_ID = 1*/ if last.Cust_ID = 1 then last_id = 1;else last_id = 0; /* For each customer,the last occurrence of a particular circle will have last.Circle = 1*/ if last.Circle = 1 then last_circle = 1;else last_circle = 0; run;
Intermediate data set:
Let’s understand the intermediate data set shown above by focussing at customer “Alan”:
Let’s look at the below codes to evaluate the total usage by customer and circle:
Code:
proc sort data = MinutesUsage; by Cust_ID Circle; run; data tot_usage (keep = Cust_ID name Circle tot_usage); set MinutesUsage; by Cust_ID Circle; /* Step 1*/ if first.Circle = 1 then tot_usage = 0; /* Step 2*/ tot_usage + minutes; /*Step 3*/ if last.circle = 1 then output; run;
Let’s understand the code by looking at the intermediate data set:
Output:
In this article, we discussed the concept of FIRST. and LAST. in SAS using an example. If you have used it in any other way, or if there is a better way of implementing this concept, please do make suggestions.
Did you find the article useful? Do let us know your thoughts about this article in the box below.
This article has been contributed by Shuvayan Das. Shuvayan has 4 years of experience with TCS, has undergone training with Jigsaw Academy and is proficient with SAS, SQL & Excel. He was also featured as Newbie of the month in our recent newsletter and happens to be one of the most active users on Analytics Vidhya Discuss.
very nice post about first. and .last. This is best example to understand Thank you
Here, while calculating the tot_usage, we have to "retain" this variable right? As sas statements are computed for each row in the dataset
clear and perfect example to understand First. and Last.