SAS Training
Post date: Apr 23, 2014 5:09:42 PM
start with a keyword, ends with ";"
keyword <statement option>;
no preserve words
missing ";" is the main issue for error
SAS is a parse language, if you can read, the compiler can too
SAS dataset contains metadata
All number in SAS stored as numeric, even if they are integer...it does not care.
missing value for number = '.'
missing value for character = ' '
To see metadata, use
proc contents data=ddfdfdfdf;
run;
show the data itself
proc print data=ssdsd.sdsd;
run;
Option statement --changes remain in effect for the rest of the job or until changed again
option is a global statement
options nodate;
options obs=50 pageno=5 date;
Comments
* your comments here;
/* your comments here */ <-- this can be inline, but cannot be nested
libname statement
We can have as many libname as we need
libname can have no more than 8 characters
libname alias 'physical/location/data';
libname newstuff <interpretation engine> 'location/of/any/file/you/want';
libname newstuff v8 'usernode:[sasabc.mydir]';
Using either (' ') or (" ") is fine for SAS
Title and Footnotes:
title 'your title here';
title5 "your title here";
footnote1 "Tom's Truck"
ODS Statement
- To reroute your output
- by default the output is set for text file
- Using ODS, you can change the outot to pdf, html, excel
** I think 'ods html body=<filename>;' = open file connection and 'html html close;' = close file connection
Proc Statement
Every proc statement follows the format below:
Proc <name of procedure> [option];
<supporting statements>
run;
*** Caveat on noduplicates *** it dedup the record next to each other only!!!
proc sort data=sasclass.clinics
out=clinics noduplicates;
by region descending clinnum;
run;
nodupkey -- only dedup based on all the variables specified in 'by'--similar to distinct in SQL
SAS process at the row level, not table level, so it does not have much memory problem.
Variables shortcut:
_all_ -- all variables
_numeric_ -- all numerical variables
_character_ -- all character variables
abc: -- variables starting with 'abc' for instance, abcXXY, abc_12
month1-month12 -- month1, month2, ...,month12
PROC DATASETS -- do things in dataset level (not table or data level)
MISSING VALUE -- will be discarded from any calculation
DATA STEP
- always begin with DATA, and ends with RUN;
DATA <name of data to be created>;
it does not matter
data clinics; or
data work.clinics;
the temp directory 'work' is created by default anyway.
You can create multiple datasets at ounce
data sasclass.data1 sasclass.data2;
data _null_; -- no dataset is created
DATA -- what to go out
SET -- what to come in. This is the mos tcommonn way to bring in the data
you can get multiple datasets at once
set dat1 dat2 dat3;
FORMAT
- every format name contains a period (.)
- does not change what stored there, it only change the display
- w := the width of overall thing
- d := decimal point
dollar w.d
percent w.d
ssn w.
character format always starts with $
$char w.
Example:
format debits credits dollar12.2;
format ssn ssn11.;
format total dollar9. lineitem comma9.;
FORMAT on proc print will not chnage the data, but on data step will change the data in the new dataset.
LENGTH
length fname $10 Inmae $16;
default length for numeric is 8 byte
it's dangerous to decrease the lenght of numeric variable.
Day2
==================== Day 2 ======================
In SAS.
numeric variable => 8 bytes
character $x => x bytes, default 8 bytes
division by 0 => missing value, and getting the note in the log file. We can prevent this by using division function.
missing op anything is missing!
Date Time in SAS
- It does not care about timezone, it's just a number
PUT
variablename= displays "variablename=variablevalue" when using command 'put'
FUNCTION
- always followed by ()
- some function does not have argument, some does
- *** all functions work across columns, whereas proc means work across rows. ***
Character functions
- length and index
name = 'Jones, Clint';
length(name); -- 12
index(name,','); -- 6
substr(name,1,col-1); -- 'Jones' extracting 5 start from 1 and go for another 5 characters
"put" always results character string
In contrast, input()will give you the numeric
d3 = input(age, 3.) * age might be char, but we convert it to numeric 3 digits
Date
today=date()
d=weekday(today); day of week 1 is sunday
day(today)
year(today)
month(today)
chartoday=put(today,mmddyy10.) * this output character, not the date object;
d =mdy(nummonth, day, year)
IF ELSE
if <expression> then action;
else if <expr> then action;
else action;
FALSE is zero or missing (.) or ('')
TRUE is not false
'Tee' > 'Boothe' is TRUE
operators
not ()
^=
>< min
<> max
|| concatenate 2 strings together
if id=1 or 2 then name = 'Sally';
(id=1) or (2)
(2) is not zero or missing --> TRUE
data newbio;
set biomass;
--pick one of them ---
where substr(station,2,1)='L'; * compilation time statement; Good when the obs you want to include is very small portion
if substr(station,2,1)='L'; * execution time statement -- exceute for every obs; Good when you want to incliude most of the variable sin the list
if substr(station,2,1) ^='L' then delete; * execution time statement -- exceute for every obs;
---
run;
IF with multiple actions using DO
if <expr> then do;
<action1>;
<action2>;
end;
DO BLOCK
do i=1 to 11 by 2;
do mo='Jan','Feb','Mar';
More complex DO loop
do i=10 to 100 by 10, 25, 37.2; is
do i=10 to 100 by 10;
do i=10 to 100 by 25;
do i=10 to 100 by 37.2;
do while(sex = 'M'); * Do as long as sex is 'M';
CHAPTER 11
temporary variable is internal variable. for instance END=, in=, _in_
_in_ = loop counter
end = end of the last read, no matter how many datasets
data newdata;
set sasclass.data end=eof;
...
data twoyear;
set yr1991 (in=in91)
yr1992 (in=in92)
end=eof;
if in91 then year=1991;
else year=1992;
sum+cost;
if eof then total = sum;
run;
KEEP and DROP
data dl;
set data;
keep station date bmtol;
if station =:'DL' and bmcrus > .4;
run;
data dl (keep station date bmtol);
set data;
if station =:'DL' and bmcrus > .4;
run;
data dl;
set data (keep station date bmtol);
keep station date bmtol;
if station =:'DL' and bmcrus > .4;
run;
OUTPUT
output; --> default -- output to all the data
data azu (keep = date 03)
liv (keep = date co);
set sth (keep = station date co o3 where=(station in ('AZU','LIV')));
if station = 'AZU' then output azu;
else if station=;LIV' then output liv;
run;
MERGE
merge = left outer join
data invoice;
merge patient visit;
by patno;
run;
This example exploits the FIRST. LAST.
data new;
set clinics;
by region clinnum;
if first.region then cnt=0;
if first.clin then cnt+1;
if last.region then output new;
run;
RENAME
data new
set old (rename=(oldname=newname));
if newname > 3;
run;
Day 3
Macro language -- generate the code
filename refname <'path to the file'>
infile refname
input
LIST style input -- needs delimiter
FORMATTED -- needs neat column
anydt -- any date format for data whose date records are different from row to row
proc export data=sasdata
outfile="outfile.csv"
dbms=excel replace;
sheet="biomass";
run;
proc import out=work.newmag
datafile="myexel.xls"
dbms=excel replace;
range="mymagdat";
getnames=yes; * get the column nmae
mixed=no; * if not mixed, it's easy to read, if mixed, read as character
scantext=yes;
usedate=yes;
scantime=yes;
run;
proc import will output some log file using infile
Macro
Macro does not work with data, but with the text--it writes code!!!
- macro stored in memory
- start with %let
%let variable = keystroke
- To refer to a macro var &<macro var name>. <-- starts with "&" and ends with "."
-Need to use double quote to embrace macro variable
macro variable contains key strokes!!!! not number not variable not data
MFILE -- createt the resolved code-- file without macro variables because all macro vars are replaced already.
filename mprint 'yoursasfile.sas';
options mprint mfile;
%MACRO %MEND
calling macro does not need ";' because macro is replacing stuff, so dont need ;.
%macro look;
exps
%mend look;
%look
positional parameters , '=' not used
using (var1=,var2=) the order when calling does not matter. You can call look(var2=xxx, var1=4)
assign value to a macro variable
call symputx('jane_age',age);
run;
In macro world there is no text, so you don't need quote
%if &airport = SAN %then %let city = San Diego;
There is no missing value concept in macro world
%if &airport = %then %let city = Unknown;
macro comments is "%*", note that * would be displayed so many times.
%*<your comments>;