SAS Arrays
Introduction to Arrays:
When you need to do same processing on a set of variables then SAS Arrays is best way of doing it. For example consider following table
Source Data:
Here are the details of equity trading in stock market. Now let’s say we want to calculated average of each attribute like TRADING_AMOUNT, BROKERAGE_AMOUNT, CREDIT_USED and INTEREST_CHARGES for the TREADED_QTY. In this case, we can have 4 steps for each attribute calculating average by dividing by TRADED_QTY for example given below.
AVG_TRADING_AMOUNT = TRADING_AMOUNT / TRADED_QTY;
But in case if there were lot many attributes and we had to repeat the same operation for each attribute then we can or we should definitely make use of SAS Arrays.
Following code demonstrates the use of arrays.
SAS Arrays: Code
DATA STOCK_TRADE;
INFILE DATALINES DLM='|';
INPUT DATE:DATE9. EQUITY_NAME:$8.
TRADED_QTY TRADING_AMOUNT
BROKERAGE_AMOUNT CREDIT_USED
INTEREST_CHARGES;
DATALINES;
10JUL2010|BOSCH|56|59820|299.1|18560|37.12
10JUL2010|CIPLA|23|5639|28.195|0|0
10JUL2010|NTPC|106|44520|222.6|0|0
10JUL2010|ONGC|88|89451|447.255|0|0
11JUL2010|BOSCH|23|36256|181.28|0|0
11JUL2010|CIPLA|99|10256|51.28|6985|13.97
11JUL2010|NTPC|66|29500|147.5|0|0
11JUL2010|ONGC|12|13005|65.025|0|0
;
RUN;
DATA UPDATES_STOCK_TRADE;
SET STOCK_TRADE;
ARRAY ACT_VALUES{*} TRADING_AMOUNT
BROKERAGE_AMOUNT CREDIT_USED
INTEREST_CHARGES;
ARRAY AVG_VALUES{*} AVG_TRADING_AMOUNT
AVG_BROKERAGE_AMOUNT AVG_CREDIT_USED
AVG_INTEREST_CHARGES;
DO I=1 TO DIM(ACT_VALUES);
AVG_VALUES{I} =
ACT_VALUES{I}/TRADED_QTY;
END;
FORMAT _NUMERIC_ 23.2;
RUN;
PROC PRINT DATA=UPDATES_STOCK_TRADE;
VAR EQUITY_NAME AVG_TRADING_AMOUNT
AVG_BROKERAGE_AMOUNT AVG_CREDIT_USED
AVG_INTEREST_CHARGES;
RUN;
You can see below are the newly calculated attributes using Array processing.
SAS Arrays: Output
Important Points to Note:
Array syntax: Array Array_Name {<size>} <$> var1 var2 var3 ……….. varN <initialization_value>;
Array – is the keyword to define array.
Array_Name – is name by which array will be referred; remember that array_name should be valid SAS variable name. It is recommended that you should also not use any SAS function name as SAS Array_Name.
{} or [] or () – any of those can be used to denote the array; before SAS 9.1 this set of parenthesis was not part of Array syntax and still SAS supports the array names without set of parenthesis but then 9.1 onwards SAS introduced this system of preceding the array_name by either of these set of parenthesis to eliminate the possible confusion between simple SAS variables and SAS arrays.
<size> - This is index which is optional value that would indicate the number of elements the array is going to have. By default it would start from 1, but you can also specify range. For example Array_Name {8:17} hr8-hr17. In this example you can see that we have not hardcoded the size but have provided the range. But keep in mind in this case the index Array_Name{1} would be INVALID. The index will start from 8 and will end at 17.
<$> - this is optional syntax which specifies that all array elements are of character type. As we said earlier SAS Array is group of SAS variables which will go under same processing. So it is mandatory that a SAS Array declared should have either all of the attributes as NUMERIC or CHARACTERS.
List of variables – the list of variables which going to be array elements; they are not necessarily as per the sequence in the input dataset. In addition you can also use short forms like shown in example above hr8-hr17 etc.
<initialization_value> - you can provide the default initial values for each element of the Array.
DIM(Array_Name) function – this function gives the number of elements the Array is having; Which can be used as upper bound while iterating for each element in the Array.
Temporary Arrays
There is concept called Temporary Array which is similar to Arrays we discussed above. The main difference is that; it is not group of variables but a group of constant values instead.
Array Array_Name {<size>} <$> _TEMPORARY_ (val1 val2 val3 ……….. valN );
You can note that you need to specify _TEMPORARY_ as additional keyword here and then list of constant values either numeric or character should be given inside braces. Temporary Arrays have been demonstrated in detail in below section.
Using Array as LookUp Technique:
We know there are different methods available to do look up in SAS. Using Arrays could be effective in some scenarios instead of using SQL JOIN or DATA-step MERGE. Let’s take the same example we discussed in PROC FORMAT.
Let’s consider I got a worldwide customer list with 100 Million customers which has got country name, cell numbers. Now my automated program wants to send a SMS to those customers but only problem is I do not have country dialing code. Now to resolve this issue I want to populate the country dialing code into customer list using country name. I have country name along with dialing code in another table.
Now natural choice would be to JOIN those tables using SQL and get the work done; otherwise one may think of using datastep MERGE. However there drawbacks with both of these methods.
1. PROC SQL creates Cartesian product for each JOIN; so if you consider 150 countries in other table; then in total it will create interim dataset with 100 million X 150 = 15000 million records. Which will eat huge memory and time required will be high.
2. Second method of using datastep merge needs the datasets to be already sorted; again sorting a dataset with 100 million records is going take lot of time and resources.
On the other side if use arrays or more specifically Temporary Arrays then the work will be done in single datastep execution with comparatively very less time. SAS code for this will look like below.
SAS Arrays For Look Up
data cust_list;
input cust_id country:$3. cell:$10.;
cards;
192345 IND 9856894899
192346 USA 8787874
198233 PAK 4578159
198233 SA 98221544
;
run;
data cust_list;
set cust_list;
array cntry_name {4} $ _temporary_
('IND' 'USA' 'SA' 'PAK');
array dialing_code {4} $ _temporary_
('+91' '+44' '+27' '+92');
do i=1 to dim(cntry_name);
if cntry_name{i} eq country
then country_code = dialing_code{i};
end;
drop i;
run;
Now you can see in the below output that country dialing codes are rightly populated in the cust_list table.
SAS Arrays: For LookUp Output