Tricky Base SAS interview questions : Part-II

Tavish Srivastava Last Updated : 24 Jun, 2022
5 min read

tough questions

SAS is the largest market-share holder for advanced analytics.

If you are going to work in analytics industry, it is impossible to escape from the language SAS. Different softwares of SAS are used in the industry for data handling. Most common of them used in data handling are SAS Enterprise guide and Base SAS. Both the tools have a similar format and usage.

The only difference being SAS Enterprise guide is the graphical user interface of Base SAS. SAS Enterprise guide is much easier to use and is modular in nature. Because of the modularity, SAS  Enterprise guide is being widely used in the industry. With time Base SAS is losing its importance in the industry, and SAS Enterprise guide is filling in for Base SAS. However, having used both, I now appreciate and prefer using Base SAS to handle data of billions of customers and trillions of transactions.

One of our previous articles (https://www.analyticsvidhya.com/blog/2013/11/4-sas-tricky-analytics-interview/) covers four tricky questions asked in SAS interviews. In this article, I will cover some tricky scenarios in which using base SAS will become far easier than using the SAS Enterprise guide. These questions are tougher and lengthier than those covered in the first part of this article series. These questions are asked widely in companies who have a broad base of analytics and deal with big data (Millions of customers, Billions of transactions, Trillions of dollar value transactions).

[stextbox id=”section”]Base SAS vs. SAS Enterprise guide[/stextbox]

Let’s first look into the pros of using both Base SAS and SAS Enterprise guide. This will make us appreciate those tricky scenarios, even more.

SAS Enterprise guide

1. EG is more common in firms with a smaller team of analytics. This is because programs on EG are far more understandable by a person who is new to the firm. Given the high attrition rate in analytics industry, it becomes very essential for such firm to hedge their risk by using the SAS Enterprise guide instead of Base SAS.

2. You develop a certain traditional routine much faster on EG than on Base SAS.

3. It is much easier to comprehend the flow using EG.

 Base SAS

1. It is much easier to code on base SAS in case the logic of code is very complex.

2. Base SAS is much faster as compare to EG.

3. It is much easier to modify a code on Base SAS than on EG.

4. Using Macros makes coding in Base SAS much easier than SAS EG

[stextbox id=”section”]Background to case 1[/stextbox]

You work in a retail industry. You have recently started a loyalty program for your customers. A study conducted on retail bank, says that the customers with a total purchase of $1,000 in 3rd month (T+ 2th months) are the customers who will finally purchase more than $30,000. You want to focus your loyalty campaign on these customers.

You have 2 data-sets. First has the entire list of customer IDs with their date of first purchase. Second data has the customer ID with their monthly purchases for each year-month. First purchase can possibly be a non-financial transaction which might not be a part of table 2.

You need to identify customers who make more than $1000 purchase in the 3rd month from the first purchase.

Table 1 :

tab1

Table 2 :

tab2

[stextbox id=”section”]Solution to Case 1[/stextbox]

This question is a classic case when Base SAS clearly beats SAS EG. In this section you will see a simple solution for this case study.

[stextbox id=”grey”]

* Creating a macro for each month

%macro fetch_data (next_mon = , third_mon = );

data create_list;

set table_1;

if first_pur < next_mon;

run;

 

proc sort data =  create_list out=list; by customer_id; run;

 

proc sort data =  table_2 out=purchase; by customer_id; run;

 

data fetch_purchase;

merge list(in=a) purchase(in=b);

if yearmonth = third_mon;

by customer_id;

if a;

run;

 

proc datasets;

append base=final_dataset data =  fetch_purchase foce;

run;

%mend fetch_data;

 

%fetch_data (next_mon = ’01Feb2012’d  , third_mon = 201204);

%fetch_data (next_mon = ’01Feb2013’d  , third_mon = 201304);

%fetch_data (next_mon = ’01Mar2012’d  , third_mon = 201205);

%fetch_data (next_mon = ’01Mar2013’d  , third_mon = 201305);

 

*Identifying the customers with purchase above $1000 in 3rd month

data shortlisted;

set   final_dataset;

if sales ge 1000;

run;

[/stextbox]

[stextbox id=”section”]Background to case 2 [/stextbox]

You work for a banking industry. You want to analyze the transaction dataset and want to find the median transaction amount for each customer. This is the amount over which we will want to pay to the customer for stretching. More the dollar value of transactions, the cheaper is the total cost of transactions.You need to make a list of all customer with their floored median transaction amount (if there are 5 transactions, we want the 2nd lowest transaction and not 3rd and if transactions are only 1 then remove the customer from the list).

The only dataset you have is unique on transaction ID. It also has the customer ID and amount of the transaction.

 Table 1

tab3

[stextbox id=”section”]Solution to Case 2 [/stextbox]

The solution to this problem is tiresome on SAS EG because there is no median function on SQL routines after grouping data. SQL routines are the foundation of data handling in SAS EG.But this becomes quite easy on Base SAS. Let’s see how this can be done easily on Base SAS.

[stextbox id=”grey”]

proc sql;

create table work.summarize as

select count(*) as trans_nos, customer_id

from work.table1

group by customer_id;

quit;

 

proc sort data = tables1; by customer_id;run;

 

proc sort data = summarize; by customer_id;run;

 

data add_total_trans;

merge table1 (in=a) summarize (in=b);

median_no = floor(trans_nos/2);

by customer_id;

drop trans_nos;

run;

 

proc sort data = add_total_trans; by customer_id amount;run;

 

data final_list;

set add_total_trans;

by customer_id amount;

if first.customer_id then n =1;

if n = median_no;

n + 1;

run;

[/stextbox]

The solution in base SAS for this question is not only effective but also time efficient.

[stextbox id=”section”]End Notes[/stextbox]

Both Base SAS and EG have their own pros and cons. The best recommended strategy is to use both. If you want to make a traditional query, use SAS EG to generate automated code. Now copy this code to make it macronized and generalized using Base SAS. The macro adds a new dimension to the codes which helps you generalize the code and avoid hard entered data.

Have you faced any other SAS problem in analytics interview? Are you facing any specific problem with SAS codes?  Do you think this provides a solution to any problem you face? Do you think there are other methods to solve the problems discussed in a more optimized way? Do let us know your thoughts in the comments below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Responses From Readers

Clear

Şarkı Sözü
Şarkı Sözü

Thank you very nice blog liked

Swati
Swati

This is great stuff. Thanks Tavish for putting this together. Waiting for few more articles from the same series.

Wasim
Wasim

Very Impressive!!!

We use cookies essential for this site to function well. Please click to help us improve its usefulness with additional cookies. Learn about our use of cookies in our Privacy Policy & Cookies Policy.

Show details