I have spent a significant part of my career as a data visualization guy. I am very particular about the formatting and presentation of reports. So, when I started using SAS, I faced a few challenges in changing formats of numbers and characters, especially when dates were involved. Not so surprisingly, both Kunal and I receive a fair number of queries on this topic.
In SAS, there are various options to enhance the reporting layouts. In this article, we will particularly discuss about methods to play with format of data values. It should be noted that these changes are only applied while displaying the results. Changing format of output does not change the way, the data gets stored at the back end.
By default, SAS provides various built in formats to deal with various formats, but they are not sufficient to meet custom requirements your data might have. For example, we may have coded Male and Female as M and F (or 0 and 1), but while printing we would want to display the field as MALE and FEMALE only. Another common example is to display area codes in 10 digit telephone numbers (e.g. 123-3456-789).
P.S. This is a long article compared to what I usually write, so feel free to digest this in bits and pieces.
The article is divided in 2 broad parts: First we understand the pre-defined formats & user defined formats in SAS. Next, we look at various applications and examples of these concepts.
Below is a sample data containing agent performance details. We will use this dataset in examples and discussion in this article. Here DOJ is in numeric format, as SAS defines by default, starting from 01-Jan-1960.
Let’s first look at the syntac of the FORMAT statement:-
Let’s do some exercises with format statement:
Code:
Output:
Here is the list of predefined date formats available to change the output format of variables:
Code:
Output:
List of predefined date formats available to change the output format of variables:
Code:
Output:
Till now we have seen, how to change the format of numbers and Dates with in built SAS formats. But there can be many occasions when SAS built-in formats do not suffice our needs. Like in current dataset, we want to:
These all things can be done with Data step (or PROC SQL) with If-Else statement (or switch case), but it will require a new variable that will store these formatted values. Whereas, if we only want to only change the display (and not the values in the data set), then creating user defined format using PROC FORMAT is a more efficient way to make these changes.
PROC FORMAT is a procedure that creates a mapping of data values into data labels. The user defined FORMAT mapping is independent of a SAS DATASET and variables and must be explicitly assigned in a subsequent DATASTEP and/or PROC.
Code:
Output:
In a similar way, we can solve it for the problem 2. Define the category A, B, C, D and E as Ultra, Super, Average, Low and Poor.
Code:
Output:
Above, I have used ranges to define the format. They can be used for both – character and numeric values.
a) Ranges can be multiple values separated by commas.
i) ‘A’, ’B’, ’C’
ii) 22, 44, 67
b) Ranges can include or exclude the bounding values, depending on the use of various keywords like:
i) 24 – 45: It includes values from 24 to 45 including 24 and 45.
ii) 24 < – 45: It includes values between 24 to 45 including 45 and excluding 24.
iii) 24 – < 45: It includes values between 24 to 45 including 24 and excluding 45.
iv) 24 < – < 45: It includes values between 24 to 45 excluding 24 and 45.
c) LOW refers to least available number, HIGH refers to the highest available number and OTHER includes all numbers not specified. Others also include missing values if it is not specified.
Code:
Output:
In PROC FORMAT, we have an option CNTLIN, that allows us to create a format from a data set rather than a VALUE statement. Before using this option we first look at the guidelines below:-
a) Input dataset must contain three variables required by PROC FORMAT – START, LABEL and FMTNAME.
b) Here START is the key field between these two tables and it must be unique in the input dataset (Dataset, we are using in CNTLIN option).
c) LABEL variable is the value we want to map it to another dataset.
d) The FMTNAME variable is the name of the format and it must be passed in single quotes.
e) After defining format, we can use put function to create variable in dataset based on key field and the format we have defined.
Example:
Let us say we want to map agent DOB from dataset Agent_DOB (Sample Data) to above dataset
Here we want to create a format based on Agent_DOB and apply it to Sales. In Agent_DOB, we would consider AGT_ID (key field, between both datasets) as START, DOB as Label and FMTNAME =’ $AGENT’. Now look at the statements (below).
Above code has generated format using dataset Agent_DOB_Fmt and after that to merge with SALES dataset have written data steps (below).
Limitation of PROC FORMAT as Merging
This is the best method when we want to merge one variable from another data set, but if we want to add five or multiple variables, then we have to repeat the PROC FORMAT statement that many times along with the multiple PUT function in data step. So, I prefer to use MERGE or PROC SQL in such cases.
All SAS formats are stored in a catalog (collection of formats). When we create a Format, it gets stored in the catalog. If we don’t specify the catalog, then SAS stores formats in the WORK library in a catalog called FORMATS. Like other datasets of WORK library, they also get deleted at the end of the session.
Now to save User defined formats, we need to specify where to store the catalog and what to call it. This can be achieved by storing formats in a library other than WORK.
Step -1 First of all we have to define a library (Here I am using the SAS University edition)
Step-2 Use library option in PROC format and provide a library name with the format file name. The file name must be a valid SAS dataset name.
Above program has created a file called Gender_Fmt.sas7bcat in your directory location.
Now, whenever we want to use stored format, we have to tell SAS to look for formats in that catalog file. This is done with the fmtsearch option. So before using it, we need to write a statement.
SAS picture format creates templates in which we define how the numbers are displayed. With use of PICTURE FORMAT, we can get over multiple display issues of numbers like:-
1. Decimal and Comma Placement
2. Embedding Characters with Numbers
3. Prefixes
Example 1: – I want to display the sales amount ending with a % sign and preceded with $ sign.
Output:
Example 2: – Let’s say, I have a 10 digit telephone number and I want to display it as 123-3456-789.
In this article, we looked at various methods to display format of data values using built in and user defined formats. We have also looked at various techniques to define formats like ranges, picture, handle missing values and unmatched values using OTHER. We have also discussed efficient merging technique using PROC FORMAT. This should be all what you need to be a pro with SAS Formats.
We have not covered In-format while reading non-standard dataset. We will discuss that in one of our future post.
Hope you found this article useful? We have simplified this topic and have tried to present it in a very simple and lucid manner. If you need any more help with SAS formatting, please feel free to ask your questions through comments below.
can please send some important topics related to interview purpose for freshers
Hi Sunil, That's very well explained. However, I have a small doubt regarding the following statement used for merging by Proc Format- Data Sales_DOB; Set Sales; *************Birth_Date=Put(Agt_ID, $Agent.); ************* Run; I am unable to interpret the statement - *************Birth_Date=Put(Agt_ID, $Agent.); ************* Can you help me understand the use of PUT here. How is the actual merging taking place here. Thanks Saurabh Kapoor
Very deeply explained article.