PROC SORT - NODUPKEY DUPOUT TAGSORT
PROC SORT is one of the most frequently used procedure for data processing; Like any other sas proceduers SORT has got wide number of useful options.
Below is basic syntax with important options for PROC SORT.
SAS Code:
PROC SORT data = input_dataset out = output_dataset
NODUPKEY / NODUPREC <Other options like TAGSORT FORCE DUPOUT= >;
BY ASCENDING var1 DESCENDING var2...;
RUN;
Explanation of options:
OUT = dataset_name : Many time user dont want to rearrange original data set; in that case using OUT= option one can redirect the output to another dataset keeping original data intact.
NODUPKEY : If the original dataset contains rows with duplicate key columns (specified in BY statement)and we wish to keep only uniqe records then NODUPKEY will drop those records.
NODUPRECS (or NODUP) : It is same as NODUPKEY however it will check for the complete duplicate observation.
DUPOUT= dataset_name: This collects duplicate records deleted by NODUP options into some different dataset specified.
TAGSORT: Its kind of memory optimization or whenever we are short of resources then using TAG SORT would help; From all observations it will fetch only key columns specified in BY statement into a temporary file; SAS will work on that temporary file and once its done it will arrange the records from the original dataset accordingly.