SAS Training

Post date: Apr 23, 2014 5:09:42 PM

start with a keyword, ends with ";"

keyword <statement option>;

no preserve words

missing ";" is the main issue for error

SAS is a parse language, if you can read, the compiler can too

SAS dataset contains metadata

All number in SAS stored as numeric, even if they are integer...it does not care.

missing value for number = '.'

missing value for character = ' '

To see metadata, use

proc contents data=ddfdfdfdf;
run;

show the data itself

proc print data=ssdsd.sdsd;
run;

Option statement --changes remain in effect for the rest of the job or until changed again

option is a global statement

options nodate;
options obs=50 pageno=5 date;

Comments

* your comments here;
/* your comments here */ <-- this can be inline, but cannot be nested

libname statement

We can have as many libname as we need

libname can have no more than 8 characters

libname alias 'physical/location/data';
libname newstuff <interpretation engine> 'location/of/any/file/you/want';
libname newstuff v8 'usernode:[sasabc.mydir]';

Using either (' ') or (" ") is fine for SAS

Title and Footnotes:

title 'your title here';
title5 "your title here";
footnote1 "Tom's Truck"

ODS Statement

- To reroute your output

- by default the output is set for text file

- Using ODS, you can change the outot to pdf, html, excel

** I think 'ods html body=<filename>;' = open file connection and 'html html close;' = close file connection

Proc Statement

Every proc statement follows the format below:

Proc <name of procedure> [option];
<supporting statements>
run;

*** Caveat on noduplicates *** it dedup the record next to each other only!!!

proc sort data=sasclass.clinics 
out=clinics noduplicates;
by region descending clinnum;
run;

nodupkey -- only dedup based on all the variables specified in 'by'--similar to distinct in SQL

SAS process at the row level, not table level, so it does not have much memory problem.

Variables shortcut:

_all_ -- all variables
_numeric_ -- all numerical variables
_character_ -- all character variables
abc: -- variables starting with 'abc' for instance, abcXXY, abc_12
month1-month12 -- month1, month2, ...,month12

PROC DATASETS -- do things in dataset level (not table or data level)

MISSING VALUE -- will be discarded from any calculation

DATA STEP

- always begin with DATA, and ends with RUN;

DATA <name of data to be created>;

it does not matter

data clinics; or

data work.clinics;

the temp directory 'work' is created by default anyway.

You can create multiple datasets at ounce

data sasclass.data1 sasclass.data2;

data _null_; -- no dataset is created

DATA -- what to go out

SET -- what to come in. This is the mos tcommonn way to bring in the data

you can get multiple datasets at once

set dat1 dat2 dat3;

FORMAT

- every format name contains a period (.)

- does not change what stored there, it only change the display

- w := the width of overall thing

- d := decimal point

dollar w.d

percent w.d

ssn w.

character format always starts with $

$char w.

Example:

format debits credits dollar12.2;
format ssn ssn11.;
format total dollar9. lineitem comma9.;

FORMAT on proc print will not chnage the data, but on data step will change the data in the new dataset.

LENGTH

length fname $10 Inmae $16;

default length for numeric is 8 byte

it's dangerous to decrease the lenght of numeric variable.

Day2

==================== Day 2 ======================

In SAS.

numeric variable => 8 bytes

character $x => x bytes, default 8 bytes

division by 0 => missing value, and getting the note in the log file. We can prevent this by using division function.

missing op anything is missing!

Date Time in SAS

- It does not care about timezone, it's just a number

PUT

variablename= displays "variablename=variablevalue" when using command 'put'

FUNCTION

- always followed by ()

- some function does not have argument, some does

- *** all functions work across columns, whereas proc means work across rows. ***

Character functions

- length and index

name = 'Jones, Clint';
length(name); -- 12
index(name,','); -- 6
substr(name,1,col-1); -- 'Jones' extracting 5 start from 1 and go for another 5 characters

"put" always results character string

In contrast, input()will give you the numeric

d3 = input(age, 3.) * age might be char, but we convert it to numeric 3 digits

Date

today=date()

d=weekday(today); day of week 1 is sunday

day(today)

year(today)

month(today)

chartoday=put(today,mmddyy10.) * this output character, not the date object;

d =mdy(nummonth, day, year)

IF ELSE

if <expression> then action;

else if <expr> then action;

else action;

FALSE is zero or missing (.) or ('')

TRUE is not false

'Tee' > 'Boothe' is TRUE

operators

not ()

^=

>< min

<> max

|| concatenate 2 strings together

if id=1 or 2 then name = 'Sally';

(id=1) or (2)

(2) is not zero or missing --> TRUE

data newbio;

set biomass;

--pick one of them ---

where substr(station,2,1)='L'; * compilation time statement; Good when the obs you want to include is very small portion

if substr(station,2,1)='L'; * execution time statement -- exceute for every obs; Good when you want to incliude most of the variable sin the list

if substr(station,2,1) ^='L' then delete; * execution time statement -- exceute for every obs;

---

run;

IF with multiple actions using DO

if <expr> then do;
<action1>;
<action2>;
end;

DO BLOCK

do i=1 to 11 by 2;
do mo='Jan','Feb','Mar';

More complex DO loop

do i=10 to 100 by 10, 25, 37.2; is
do i=10 to 100 by 10;
do i=10 to 100 by 25;
do i=10 to 100 by 37.2;

do while(sex = 'M'); * Do as long as sex is 'M';

CHAPTER 11

temporary variable is internal variable. for instance END=, in=, _in_

_in_ = loop counter

end = end of the last read, no matter how many datasets

data newdata;

set sasclass.data end=eof;

...

data twoyear;

set yr1991 (in=in91)

yr1992 (in=in92)

end=eof;

if in91 then year=1991;

else year=1992;

sum+cost;

if eof then total = sum;

run;

KEEP and DROP

data dl;
set data;
keep station date bmtol;
if station =:'DL' and bmcrus > .4;
run;
data dl (keep station date bmtol);
set data;
if station =:'DL' and bmcrus > .4;
run;
data dl;
set data (keep station date bmtol);
keep station date bmtol;
if station =:'DL' and bmcrus > .4;
run;

OUTPUT

output; --> default -- output to all the data

data azu (keep = date 03)

liv (keep = date co);

set sth (keep = station date co o3 where=(station in ('AZU','LIV')));

if station = 'AZU' then output azu;

else if station=;LIV' then output liv;

run;

MERGE

merge = left outer join

data invoice;

merge patient visit;

by patno;

run;

This example exploits the FIRST. LAST.

data new;
set clinics;
by region clinnum;
if first.region then cnt=0;
if first.clin then cnt+1;
if last.region then output new;
run;

RENAME

data new

set old (rename=(oldname=newname));

if newname > 3;

run;

Day 3

Macro language -- generate the code

filename refname <'path to the file'>

infile refname

input

LIST style input -- needs delimiter

FORMATTED -- needs neat column

anydt -- any date format for data whose date records are different from row to row

proc export data=sasdata
outfile="outfile.csv"
dbms=excel replace;
sheet="biomass";
run;
proc import out=work.newmag
    datafile="myexel.xls"
    dbms=excel replace;
range="mymagdat";
getnames=yes; * get the column nmae
mixed=no; * if not mixed, it's easy to read, if mixed, read as character
scantext=yes;
usedate=yes;
scantime=yes;
run;

proc import will output some log file using infile

Macro

Macro does not work with data, but with the text--it writes code!!!

- macro stored in memory

- start with %let

%let variable = keystroke

- To refer to a macro var &<macro var name>. <-- starts with "&" and ends with "."

-Need to use double quote to embrace macro variable

macro variable contains key strokes!!!! not number not variable not data

MFILE -- createt the resolved code-- file without macro variables because all macro vars are replaced already.

filename mprint 'yoursasfile.sas';
options mprint mfile;

%MACRO %MEND

calling macro does not need ";' because macro is replacing stuff, so dont need ;.

%macro look;

exps

%mend look;

%look

positional parameters , '=' not used

using (var1=,var2=) the order when calling does not matter. You can call look(var2=xxx, var1=4)

assign value to a macro variable

call symputx('jane_age',age);

run;

In macro world there is no text, so you don't need quote

%if &airport = SAN %then %let city = San Diego;

There is no missing value concept in macro world

%if &airport = %then %let city = Unknown;

macro comments is "%*", note that * would be displayed so many times.

%*<your comments>;