SAS retain statement and first. last. options

FIRST. and LAST. automatic variables:

When SAS dataset has got sorted by some variables then at the time datastep execution SAS creates two automatic variables called FIRST. and LAST.

For example lets consider the example of ready made dataset CARS available in SASHELP library. From PROC CONTENTS of this dataset we can get the information that this dataset is already sorted by MAKE and TYPE.

So if you execute a data step on this data with BY statement while execution you will get FIRST.MAKE and LAST.MAKE as well as FIRST.TYPE and LAST.TYPE variables automatically getting created.

These variables will have values either 0 or 1; for first record within same set of MAKE the FIRST.MAKE will have value 1 otherwise 0; likewise for last record within same set of MAKE the LAST.MAKE will have value 1 otherwise 0.

We can logically use these variables for achieving required results.

RETAIN statement:

At the time of datastep execution we know that after each iteration all the variable/columns present in PDV get assigned to MISSING values. So every every variable gets new value from the new record. There is no way to refer the old value for some variable from previous record and make some decision based on that. Here RETAIN statement comes to an rescue.

With RETAIN statement you can persist the value of any variable for next iteration till the end of datastep. You can use any existing variable or create a new one to store the value of existing variable. You can also initialize the value to that variable in the RETAIN statement. Just remember that if this is going to be a new variable it will become permanent member of the dataset until you explicitly DROP it.

lets illustrate both of these concepts by following example.

Illustration: Accumulating INVOICE values from SASHELP.CARS dataset for each MAKE and for each TYPE within that MAKE.

There are various makes like TOYOTA, HONDA etc. present in SASHELP.CARS dataset and each of these MAKES have various types like SEDAN, SUV etc.; Now lets consider our requirement is to calculate the total of INVOICE amount for each type under each make and total of INVOICE amount for each MAKE as well.

Lets see how can we code this.

SAS Code:

DATA CARS_SUMMARY (KEEP=MAKE TYPE TOT_TYPE_INVOICE TOT_MAKE_INVOICE);

SET SASHELP.CARS;

RETAIN TOT_TYPE_INVOICE TOT_MAKE_INVOICE 0;

BY MAKE TYPE;

IF FIRST.MAKE THEN TOT_MAKE_INVOICE = 0;

IF FIRST.TYPE THEN TOT_TYPE_INVOICE = 0;

TOT_MAKE_INVOICE = TOT_MAKE_INVOICE + INVOICE;

TOT_TYPE_INVOICE = TOT_TYPE_INVOICE + INVOICE;

IF LAST.MAKE THEN OUTPUT;

RUN;

Output: Sample Output (of TOTAL 114 records)

  • By default when SAS reaches the run; statement is does the implicit RETURN to the set statement and iterates the code for the next record in source dataset. It initializes all the PDV values to missing on reaching to RUN; statement.

  • RETAIN statement instructs SAS not to reinitialize values to missing for the variables specified in the RETAIN statement.

  • in Retain statement we can initialize the value of the variables as shown above by default it will be initialized to 0.

  • SUM statement - when we use traditional addition statement and if one the operands value is null or missing in that case addition will result in missing value e.g. month_tot = month_tot + daily_sale_amt; (if in this case daily_sale_amt is missing then month_tot will result in missing value and will finally result in missing month_tot)

  • To avoid this we use the expression as month_tot + daily_sale_amt; Here month_tot initially gets assigned to 0 and then daily_sale_amt values gets added to it if it is not null.