PROC SORT - NODUPKEY DUPOUT TAGSORT

PROC SORT is one of the most frequently used procedure for data processing; Like any other sas proceduers SORT has got wide number of useful options.

Below is basic syntax with important options for PROC SORT.

SAS Code:

PROC SORT data = input_dataset out = output_dataset

NODUPKEY / NODUPREC <Other options like TAGSORT FORCE DUPOUT= >;

BY ASCENDING var1 DESCENDING var2...;

RUN;

Explanation of options:

  • OUT = dataset_name : Many time user dont want to rearrange original data set; in that case using OUT= option one can redirect the output to another dataset keeping original data intact.

  • NODUPKEY : If the original dataset contains rows with duplicate key columns (specified in BY statement)and we wish to keep only uniqe records then NODUPKEY will drop those records.

  • NODUPRECS (or NODUP) : It is same as NODUPKEY however it will check for the complete duplicate observation.

  • DUPOUT= dataset_name: This collects duplicate records deleted by NODUP options into some different dataset specified.

  • TAGSORT: Its kind of memory optimization or whenever we are short of resources then using TAG SORT would help; From all observations it will fetch only key columns specified in BY statement into a temporary file; SAS will work on that temporary file and once its done it will arrange the records from the original dataset accordingly.