SAS for R users

September 2018

My learnings of SAS

Why should I learn SAS?

SAS is a legacy software in many industries (40+ years!). It's fairly easy for newbies to pick up with its point and click and has a 'business analytics' side making it more attractive to industry. It can integrate fairly well with databases as well.

Can I learn SAS?

SAS is proprietary but you can get a university version which i've used for this. However, it's not really built to work on a mac.

SAS

I will use up the some of my LinkedIn Premium features and take a SAS course on there https://www.linkedin.com/learning/sas-programming-for-r-users-part-1/introduction-to-sas-and-sas-studio. I will skip over the R stuff and take note of the SAS stuff as it not easy to call R with SAS on my mac (http://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#imlug_r_sect003.htm)

Base SAS - Built in functions

SAS/ACCESS - Reading in data

SAS/STAT - Analytic models

SAS/IML - Interactive matrix language

ETS license - Time series, forecasting

SAS Studio has different windows:

code editor/work area
navigation pane

Store data in Libraries permanently. WORK library is temporary.

Introduction and Working is SAS

Working in SAS Studio

Click on Libraries -> SASHELP -> CARS (double click) opens a new Table in a new tab. Blue arrow goes to 'next page'34.

Change columns shown by using the tick boxes.

Right click column header to sort data by that column e.g. ascending or filtered (e.g. Invoice >= 30000) . To remove the filter click on the 'x' in the View tab. To view the code for this action click on 'Display the query that creates the current table'. There is three procedures (as SQL procedure, a datasets procedure and print procedure):

PROC SQL;

CREATE TABLE WORK.query AS

SELECT Make , DriveTrain , Invoice , Cylinders FROM SASHELP.CARS WHERE Invoice>=30000;

RUN;

QUIT;

PROC DATASETS NOLIST NODETAILS;

CONTENTS DATA=WORK.query OUT=WORK.details;

RUN;

PROC PRINT DATA=WORK.details;

RUN;

Writing a program

Click on 'New Options' (seven dots button in top right). You apply a procedure to a data table. Type PROC PRINT and it will pop the help text box if you don't want this you can remove it by clicking the 'More application options' (three horizontal lines button in top right) -> Preferences -> Editor -> Click off 'Enable autocomplete' -> Save. You can also right click a keywork e.g. Print and click 'Syntax Help'. Click the running man button to run. Check the LOG tab for error outputs. Click on the Note and it'll take you to that line. You can click on 'open in a new browser tab' to see the table more clearly.

Every statement must end with a ;

* Print the CARS data table;

*PROC PRINT DATA=SASHELP.CARS;

*RUN;

* Prints the entire data table;

/*

 * Print certain columns in the CARS data table.

*/

PROC PRINT DATA=SASHELP.CARS;

   * VAR is additional arguments to the PROC;

   * You can choose the columns by clicking on libraries -> CARS;

   * Hold control on the keyboard and click the columns of interest;

   * Drag and drop to after VAR;

   * Remove the 'SASHELP.CARS';

   VAR Make Model MPG_City;

RUN;

Using Tasks and Snippets in SAS Studio

Tasks = Point and click features which generate code behind the scene

Click 'Tasks and Utilities' -> 'Tasks' -> 'Statistics' -> 'Summary Statistics' -> 'Select a table' -> add 'Weight' to Analysis variables and this will generate some code. -> 'options' -> 'PLOTS' -> 'Histogram' and 'Add normal density curve' -> 'Run'.

ods noproctitle;

ods graphics / imagemap=on;

proc means data=SASHELP.CARS chartype mean std min max n vardef=df;

  var Weight;

run;

proc univariate data=SASHELP.CARS vardef=df noprint;

 var Weight;

 histogram Weight / normal(noprint);

run;

Click on 'Snippets' = starter codes -> 'Snippets' -> 'Graph' -> 'Scatter Plot Matrix' -> 'Run'

ods noproctitle;

ods graphics / imagemap=on;

proc means data=SASHELP.CARS chartype mean std min max n vardef=df;

  var Weight;

run;

proc univariate data=SASHELP.CARS vardef=df noprint;

 var Weight;

 histogram Weight / normal(noprint);

run;

Click on 'New Snippet' and copy above 'proc print' code -> 'Save' as 'Print Variables'. Will save under 'My Snippets'. Can then drag and drop it into code.

Bayesian Logistic Regression

Logistic Regression = Regression on data when the dependent variable is binary (e.g. True, False)

Bayesian = Update statistical inference (e.g. what kind of distribution) as more data becomes available. Here is a nice plot showing the distribution being updated.

mcmc = Markov Chain Monte Carlo. A Markov Chain is a stochastic model which the probability of each event depends on the state attained in the previous event (here is an example). Markov Chain Monte Carlo is convergence of a probability distribution given a number of samples. You can read more about in the SAS documentation

The code for this is HERE

Data step is for reading in data, altering data, subsetting data.

You can highlight code just to run that part.

Poker Simulation

IML = Interactive Matrix Language

Texas Hold 'em:

9 players at a table
Each player: 2 cards face-down
Dealer: 5 cards face-up

The code for this is HERE

Multiple Linear Regression Power Analysis

The code for this is HERE

Calling R from SAS

The code for this is HERE

SAS libraries

Work library is temporary. Sashelp has sample datasets. Sasuser save datasets you work with often.

Define your own library as:

libname SP4R "s:\workshop"

Add data sets in your library and call as spr4.frog.

Procedure syntax

In PROC Step: STATEMENT ... <option>;

Can save data using outpost= in PROC mcmc for example.

SAS Documentation

Click on the product e.g. SAS Analytical Products 14.1 -> What's new in SAS/STAT -> Contents -> Procedures -> The MCMC or Topics -> Bayesian analysis -> MCMC and you can see the statements. Click on 'PROPDIST' and click on Metropolis and Metropolis-Hastings Algorithms to find more about the method. Click the examples tab and click on Logistic Regression Model with a Diffuse Prior then copy and paste the example.

SAS Training

Free tutorials

Importing and Reporting Data

Creating datasets

The code for this is HERE

* Save as example_data in sp4r library;

* Specific length of 25 for characters;

data sp4r.example_data;

   length First_Name $ 25 Last_Name $ 25;

   input First_Name $ Last_Name $ age height;

   datalines;

   Jordan Bakerman 27 68

   Bruce Wayne 35 70

   Walter White 51 70

   Henry Hill 65 66

   JeanClaude VanDamme 55 69

run;

* @@ is trailing hold the line;

data sp4r.example_data2;

length First_Name $ 25 Last_Name $ 25;

input First_Name $ Last_Name $ age height @@;

datalines;

Jordan Bakerman 27 68 Bruce Wayne 35 70 Walter White 51 70

Henry Hill 65 66 JeanClaude VanDamme 55 69

run;

Importing raw data files

Can use DATA step as above or use PROC IMPORT (.csv, .xlsx)

The code for this is HERE and data is HERE

* Import data using DATA step;

data sp4r.all_names;

   length First_Name $ 25 Last_Name $ 25;

   infile "&path\allnames.csv" dlm=',';

   input First_Name $ Last_Name $ age height;

run;

* Import data using PROC import;

* REPLACE will overwrite sp4r.baseball;

* getnames=yes will use header;

* data starts on second row;

proc import out=sp4r.baseball

   datafile= "&path\baseball.csv" DBMS=CSV REPLACE;

   getnames=yes;

   datarow=2;

run;

* Rename the variables;

/*Rename the variables*/

data sp4r.baseball;

   set sp4r.baseball;

   rename nAtBat = At_Bats

      nHits = Hits

      nHome = Home_Runs

      nRuns = Runs

      nRBI = RBIs

      nError = Errors;

run;

Reporting data

See data type and length:

proc contents data=sp4r.cars varnum;

run;

See the data head(1:6):

proc print data=sp4r.cars (firstobs=1 obs=6);

run;

Print unique level

proc sql;

   SELECT UNIQUE origin FROM sp4r.cars;

quit;

Use upcase to print conditionally if you don't know the case of the variable name.

proc print data=sp4r.cars;

   var gender

   where upcase(gender)='MALE'; * It is actually called 'Male'

run;

* ^= is not equal. = is equal to. Can also use NE, GE

* IN is it equal to one of a list. where country in ('US','CA'); where country is USA or Canada.

* &, | = AND, OR. ^=NOT. where country not in ('US','CA');

Change column labels and data format. The code for this is HERE

* Change FN column to first name;

proc print data=sp4r.business label;

   label FN='First Name'

run;

* Change data format;

* $ = character;

*DOLLAR12.2 - convert to dollar with 12 characters and 2 d.p;

*MMDDYY10. Convert SAS numeric day to date

proc print data=sp4r.business;

   format salary dollar12.2 hire_date mmddyy10.;

run;

* can create on format or look at docs online

proc format;

   value $jobformat 'SR'='Sales Rep'

                    'SM'='Sales Manager';

   value bonusformat 0='No' 1='Yes';

run;

proc print data=sp4r.business;

   format job $jobformat. bonus bonusformat.;

run;

data employees;

   input name $ bday :mmddyy8. @@;

   datalines;

   Jill 01011960 Jack 05111988 Joe 08221975

run;

proc print data=employees;

run;

* Now use label and format;

data employees;

   input name $ bday :mmddyy8. @@;

   format bday mmddyy10.;

   label name="First Name" bday="Birthday";

   datalines;

   Jill 01011960 Jack 05111988 Joe 08221975

run;

proc print data=employees label;

run;

Create variables and new data

* Add two columns;

data sp4r.cars;

   set sp4r.cars;

   wheelbase_plus_length = wheelbase+length;

run;

* Change values conditionally;

data sp4r.cars;

   set sp4r.cars;

   if mpg<20 then bonus=0;

   else if mpg_highway<30 then bonus=1000;

   else bonus=2000

run;

* Create new character variable

data spr4.cars;

   set sp4r.cars

   length type2 $ 25;

   if type in ('Hybrid','SUV')

      then type2='Family Vehicle';

   else type2='Truck or Sports Vehicle';

run;

* DO group if need to create more than one variable

data sp4r.cars;

   set sp4r.cars;

   length frequency $ 12;

   if mpg_highway<20 then do;

      bonus=0;

      frequency='No Payment';

   end;

   else if mpg_highway<30 then do;

      bonus=1000;

      frequency='One Payment';

   end;

   else do;

      bonus=1000;

      frequency='Two Payments';

   end;

run;

Create and use functions

data sp4r.cars;

   set spr4.cars;

   log_price = log(msrp);

run;

* Mean across rows

data sp4r.cars;

   set spr4.cars;

   mean_mpg = mean(mpg_highway,mpg_city);

run;

* _NULL_ = don't edit values

data _NULL_;

   a=mean(1,2,3,4,5);

   b=exp(3);

   c=var(10,20,30);

   d=poisson(1,2);

   put a b c d; * to Log;

run;

* String functions e.g. SUBSTR, SCAN.

newstr = substr(str,length(str),1)

newstr = scan(str,2,',') ; second word and ,

concatstr = catx(' ',str1,str2);

newvar = transwrd(var,'str ','newstr ') * replace str with newstr in column

Functions can be found at SAS 9.4 -> documentation by title -> functions and CALL routines -> dictionary of functions and CALL routines. e.g. look at example in FIND

Create functions in functions compiler procedure (return single values)

* Switch order of string

proc fcmp outlib=sp4r.functions.newfuncs;

   function ReverseName(name $) $;

   length newname $ 40;

   newname=catx(' ',scan(name,2,','),scan(name,1,','));

   return(newname);

   endsub;

quit;

options cmplib=sp4r.functions

data sp4r.school;

   set sp4r.school;

   FLName=ReverseName(name);

run;

Subset data

* Keep some variables (columns)

data sp4r.cars2 (keep=make msrp invoice); * can also use drop

   set sp4r.cars;

run;

* subset by row [25:50]

data sp4r.cars2;

   set sp4r.cars (firstobs=25 obs=50);

run;

* subset conditionally

data sp4r.cars2;

   set sp4r.cars;

   where mpg_city > 35;

run;

* Create table

proc sql;

   create table sp4r.origin as

   SELECT UNIQUE origin FROM sp4r.cars;

quit;

Concat data

* Combine rows using SET

data a_all;

   SET a1 a2;

run;

* Combine columns

data b_all;

   SET b_col1;

   SET b_col2;

run;

* use merge to concat data tables with different dimension (cbind)

data c_all;

   merge c_small_col c_long_col;

run;

* can do merge according to a common variable (similar to SQL join)

* sort data first using PROC SORT

https://www.linkedin.com/learning/sas-programming-for-r-users-part-2/introduction-to-sas-and-sas-studio

DO loop

do i=2 to 10 by 2;

*do i=10 to 2 by -2;

end;

data loop;

*data loop (keep=x rep);

*data loop (drop=i);

   do i=2 to 10 by 2;

      x = i+1;

      rep = 1;

      output; * save all

   end;

run;

* Iterate over values in data (similar to enumerate) to append a coloumn

data doloop;

   do i=1 to 2;

      output;

   end;

run;

data doloop;

   set doloop;

   do j=1 to 2;

      output;

   end;

run;

Generate random numbers

RAND('Normal',mean,std)

Do loop and random number generator. Code HERE

/*Part A*/

data sp4r.random (drop=i);

   call streaminit(123);

   do i=1 to 10;

      rnorm = rand('Normal',20,5);

      rbinom = rand('Binomial',.25,1);

      runif = rand('Uniform')*10;

      rexp = rand('Exponential')*5;

      output;

   end;

run;

proc print data=sp4r.random;

run;

/*Part B*/

data sp4r.random;

   call streaminit(123);

   set sp4r.random;

   rgeom = rand('Geometric',.1);

run;

proc print data=sp4r.random;

run;

/*Part C*/

data sp4r.doloop (drop=j);

   call streaminit(123);

   do group=1 to 5;

      do j=1 to 3;

         rpois = rand('Poisson',25);

         rbeta = rand('Beta',.5,.5);

         seq+1;

         output;

      end;

   end;

run;

proc print data=sp4r.doloop;

run;

/*Part D*/

data sp4r.quants;

do q=-3 to 3 by .5;

   pdf = pdf('Normal',q,0,1);

   cdf = cdf('Normal',q,0,1);

   quantile = quantile('Normal',cdf,0,1);

   output;

end;

run;

proc print data=sp4r.quants;

run;

~R plots using PROC SGPLOT (statistical graphics plot). Code HERE

proc sgplot data=sales;

   scatter x=month y=revenue;

   scatter x=month y=revenue_2;

   *series x=month y=revenue;

   *series x=month y=revenue_2;

run;

proc sgplot data=sales;

   scatter x=month y=revenue / group=company;

run;

proc sgplot data=sales;

   scatter x=month y=revenue;

   by company;

run;

/*Part A*/

data sp4r.hist_data;

   call streaminit(123);

   do i=1 to 1000;

      x = rand('exponential')*10;

      output;

   end;

run;

proc sgplot data=sp4r.hist_data;

   histogram x;

run;

proc sgplot data=sp4r.hist_data;

   histogram x / binwidth=1;

   density x / type=normal;

   density x / type=kernel;

run;

/*Part B*/

data sp4r.boxplot_data (drop=rep);

   call streaminit(123);

   do group=1 to 3;

      do rep=1 to 100;

         response = rand('exponential')*10;

         output;

      end;

   end;

run;

proc sgplot data=sp4r.boxplot_data;

    hbox response;

run;

proc sgplot data=sp4r.boxplot_data;

    hbox response / category=group;

run;

/*Part C*/

data sp4r.sales;

   call streaminit(123);

   do month=1 to 12;

      revenue = rand('Normal',10000,5000);

      output;

   end;

run;

proc sgplot data=sp4r.sales;

   vbar month / response=revenue;

run;

/*Part D*/

data sp4r.series_data (keep=x y1 y2);

   call streaminit(123);

   do x=1 to 30;

      beta01 = 10;

      beta11 = 1;

      y1 = beta01 + beta11*x + rand('Normal',0,5);

      beta02 = 35;

      beta12 = .5;

      y2 = beta02 + beta12*x + rand('Normal',0,5);

      output;

   end;

run;

proc sgplot data=sp4r.series_data;

   scatter x=x y=y1;

   scatter x=x y=y2;

run;

proc sgplot data=sp4r.series_data;

   series x=x y=y1;

   series x=x y=y2;

run;

proc sgplot data=sp4r.series_data;

   series x=x y=y1;

   scatter x=x y=y1;

   series x=x y=y2;

   scatter x=x y=y2;

run;

/*Part E*/

* regression, confidence limits and prediction limits

proc sgplot data=sp4r.series_data;

   reg x=x y=y1 / clm cli;

   reg x=x y=y2 / clm cli;

run;

Enhancing the plot. Can save as a pdf. Code HERE

/*Part A*/

data sp4r.sales;

   call streaminit(123);

   do month=1 to 12;

      revenue = rand('Normal',10000,1000);

      revenue_2 = rand('Normal',13000,500);

      output;

   end;

run;

/*Part B*/

proc sgplot data=sp4r.sales;

   series x=month y=revenue / legendlabel='Company A'

      lineattrs=(color=blue pattern=dash);

   series x=month y=revenue_2 / legendlabel='Company B'

      lineattrs=(color=red pattern=dash);

   title 'Monthly Sales of Company A and B for 2015';

   xaxis label="Month" values=(1 to 12 by 1);

   yaxis label="Revenue for 2015";

   inset "Jordan Bakerman" / position=bottomright;

   refline 6.5 / transparency= 0.5 axis=x;

   refline 11000 / transparency= 0.5;

run;

title;

/*Part C*/

proc sgplot data=sp4r.sales;

   series x=month y=revenue / legendlabel='Company A' name='Company A'

      lineattrs=(color=blue pattern=dash);

   scatter x=month y=revenue / markerattrs=(color=blue

      symbol=circlefilled);

   series x=month y=revenue_2 / legendlabel='Company B'

      name='Company B' lineattrs=(color=red pattern=dash);

   scatter x=month y=revenue_2 / markerattrs=(color=red

      symbol=circlefilled);

   title 'Monthly Sales of Company A and B for 2015';

   xaxis label="Month" values=(1 to 12 by 1);

   yaxis label="Revenue for 2015" min=8000 max=14000;

   inset "Jordan Bakerman" / position=bottomright;

   refline 11000 / transparency= 0.5;

   refline 6.5 / transparency= 0.5 axis=x;

   keylegend 'Company A' 'Company B';

run;

title;

Create faceted plots PROC SCSCATTER (matrix, plot, compare). code HERE

proc sgscatter data=sp4r.cars;

   plot mgg_cars*weight mpg_city*length

      weight*length / columns=3;

run;

* multi-cell plot

ods layout start rows=1 columns=3;

ods region row=1 column=3;

proc sgplot data=sp4r.cars;

   hbox mpg_city;

run;

ods layout end;

proc sgpanel data=sp4r.cars;

   panelby origin / columns=3;

   histogram mpg_city;

run;

proc sgpanel data=sp4r.lesscars;

   panelby origin type / rows=1 columns=3;

   reg x=weight y=mpg_city;

run;

/*Part A*/

data sp4r.multi;

   call streaminit(123);

   do Sex='F', 'M';

      do j=1 to 1000;

         if sex='F' then height = rand('Normal',66,2);

         else height = rand('Normal',72,2);

         output;

      end;

   end;

run;

/*Part B*/

proc sgpanel data=sp4r.multi;

   panelby sex;

   histogram height;

   density height / type=normal;

   title 'Heights of Males and Females';

   colaxis label='Height';

run;

title;

/*Part C*/

ods layout Start rows=1 columns=3 row_height=(1in) column_gutter=0;

ods region row=1 column=1;

proc sgplot data=sp4r.multi (where= (sex='F'));

   histogram height / binwidth=.5;

   title 'Histogram of Female Heights';

run;

title;

ods region row=1 column=2;

proc sgplot data=sp4r.multi (where= (sex='F'));

   density height / type=kernel;

   title 'Density Estimate of Female Heights';

run;

title;

ods region row=1 column=3;

proc sgplot data=sp4r.multi (where= (sex='F'));

   hbox height;

   title 'Boxplot of Female Hieghts';

run;

title;

ods layout end;

Descriptive Procedures, Output Delivery System, and Macros

CORR, FREQ, MEANS, UNIVARITE Procedures on varaible (column)

* Correlation matrix and covariance matrix

proc corr data=sp4r.cars cov;

   var horsepower weight length;

run;

* categorical data

proc freq data=sp4r.cars;

   tables origin type;

run;

proc freq data=sp4r.cars;

   tables origin*type; * cross table

   *tables origin*type / norow nocol nopercent;

run;

proc freq data=sp4r.cars nlevels;

   tables origin*type; * cross table

   *tables origin*type / noprint;

run;

MEANS gives stats summary and UNIVARTE. e.g. SWEWNESS, P10. code HERE

proc means data=spr4.cars maxdec=2 * mean median var;

   var mpg_city mpg_highway;

run;

HISTOGRAM

QQPLOT

/*Part A*/

proc contents data=sp4r.ameshousing varnum;

run;

/*Part B*/

proc univariate data=sp4r.ameshousing;

   var saleprice;

   histogram saleprice / normal kernel;

   inset n mean std / position=ne;

   qqplot saleprice / normal(mu=est sigma=est);

run;

Output Delivery System (ODS) (makes the tables). Code HERE

ods trace on;

proc univariate data=sp4r.ameshousing;

   var saleprice;

   qqplot saleprice / normall(me=est sigma=est)

run;

ods trace off;

ods select basicmeasures qqplot;

proc univariate data=sp4r.ameshousing;

   var saleprice;

   qqplot saleprice / normal(mu=est sigma=est);

run;

Save as a new SAS data set in PROC Step. code HERE

ods output basicmeasures = SP_BasicMeasures; * object = data-set-name

proc univariate data=sp4r.ameshousing;

   var saleprice;

run;

* To save a value

proc univariate data=sp4r.ameshousing;

   var saleprice;

   output out=stats mean=sp_mean;

run;

/*Part A*/

ods select basicmeasures;

ods output basicmeasures = sp4r.SalePrice_BasicMeasures;

proc univariate data=sp4r.ameshousing;

   var saleprice;

run;

proc print data=sp4r.saleprice_basicmeasures;

run;

/*Part B*/

proc univariate data=sp4r.ameshousing;

   var saleprice;

   * choose percentile points

   output out=sp4r.stats mean=saleprice_mean pctlpts= 40, 45, 50, 55, 60

      pctlpre=saleprice_;

run;

proc print data=sp4r.stats;

run;

/*Part C*/

proc means data=sp4r.ameshousing;

   var saleprice garage_area;

   output out=sp4r.stats mean(saleprice)=sp_mean median(garage_area)=ga_med;

run;

proc print data=sp4r.stats;

run;

/*Part D*/

proc means data=sp4r.ameshousing;

   var saleprice garage_area;

   output out=sp4r.stats mean= std= / autoname;

run;

proc print data=sp4r.stats;

run;

Global macro variables (use variables in other datasets)

%let height = 67;

%len name = Ray Bell;

&height ;to use

* value

%let year = 2010;

proc print data=sp4r.ameshousing;

   where yr_sold = &year;

   var yr_sold saleprice;

   title "Price of Homes Sold in &year"

run;

* str

%let gtype = Attached;

proc print data=sp4r.ameshousing;

   where g = "&gtype";

   var yr_sold saleprice;

   title "a &gtype"

run;

Automating creating global macro variables in PROC SQL. Code HERE

proc means data=sp4r.ameshousing;

   var saleprice;

   output out=stats mean=mean std=sd; * out put mean and sd

run;

proc sql;

   select mean into :sp_mean from stats;

   select sd into :sp_sd from stats;

quit;

%put The mean and sd are &sp_mean ... * to write to log

%put _USER_

/*Part A*/

proc means data=sp4r.ameshousing;

   var saleprice;

   output out=sp4r.stats mean=sp_mean std=sp_sd;

run;

proc sql;

   select sp_mean into :sp_mean from sp4r.stats;

   select sp_sd into :sp_sd from sp4r.stats;

quit;

/*Part B*/

data sp4r.ameshousing;

   set sp4r.ameshousing;

   sp_stan = (saleprice - &sp_mean) / &sp_sd;

run;

proc print data=sp4r.ameshousing (obs=6);

   var saleprice sp_stan;

run;

proc means data=sp4r.ameshousing mean std;

   var saleprice sp_stan;

run;

/*Part C*/

proc contents data=sp4r.cars varnum out=carscontents;

run;

proc print data=carscontents;

   var name type;

run;

/*Part D*/

proc sql;

   select distinct name into: vars_cont separated by ' ' from carscontents where type=1;

   select distinct name into: vars_cat separated by ' ' from carscontents where type=2;

quit;

%put The continuous variables are &vars_cont and the categorical variables are &vars_cat;

Macro programs = R Function. code HERE?

%macro today

   %out Today is $sysday $sysdate9;

%mend;

%today

%macro calc(dsn,vars);

   proc means data=&dsn;

      var &vars;

   run;

%mean calc;

%calc(business,yield)

Keyword parameters (like python)

start=01jan08

Business example

Daily report

Weekly report every Friday

/*Part A*/

%macro mymac(dist,param1,param2=,n=100,stats=no,plot=no);

/*Part B*/

%if &dist= %then %do;

   %put Dist is a required argument;

   %return;

%end;

%if &param1= %then %do;

   %put Param1 is a required argument;

   %return;

%end;

/*Part C*/

%if &param2= %then %do;

   data random (drop=i);

      do i=1 to &n;

         y=rand("&dist",&param1);

         x+1;

         output;

      end;

   run;

%end;

%else %do;

   data random (drop=i);

      do i=1 to &n;

         y=rand("&dist",&param1,&param2);

         x+1;

         output;

      end;

   run;

%end;

/*Part D*/

%if %upcase(&stats)=YES %then %do;

   proc means data=random mean std;

      var y;

   run;

%end;

/*Part E*/

%if %upcase(&plot)=YES %then %do;

   proc sgplot data=random;

      histogram y / binwidth=1;

      density y / type=kernel;

   run;

%end;

%mend;

/*Part F*/

%mymac(param1=0.2,stats=yes)

/*Part G*/

%mymac(dist=Geometric,param1=0.2,param2=,stats=yes)

/*Part H*/

options mprint;

%mymac(dist=Normal,param1=100,param2=10,n=1000,plot=yes)

Macro program for iterative processing. Code HERE

%macro myappend(start,stop);

   %do year=&start %to &stop;

      proc import datafile="&path\sales_&year..csv" out=sp4r.sales_&year dbms=csv replace;

      run;

      proc append base=sp4r.sales_all data=sp4r.sales_&year;

      run;

      proc datasets library=sp4r noprint;

         delete sales_&year;

      quit;

   %end;

%mend;

options mprint;

%myappend(2000,2009)

/*Why did we use a double period to specify the DATAFILE above?*/

%let mypath = s:workshop\;

%put &mypathmydata.csv;

%put &mypath.mydata.csv;

%let mydata = sales_data;

%put &mydata.csv;

%put &mydata..csv;

SAS Webinars

My notes on SAS Webinars I listened to:

En hancing AML (Anti-Money Laundering) Efficiency and Effectiveness: AI (Artificial Intelligence) Transforms the Rules of the Game

AI = Training computers to perform tasks to mimic human reasoning.

Machine Learning = Subset of AI to automatically learn and improve from experience without being explicitly programmed.

Applications:

Graph analytics
Correlation, Regression analysis
Cluster Analysis (groups, spot outliners)
Neural Network, Predictive Analytics (complex and unknown patterns)

Robotic Process Automation = Software to mimic human action by automating simply and repetitive talks. e.g. chat bot.

SAS Adaptive Learning and Intelligent Agent System.

https://www.sas.com/en_us/software/anti-money-laundering.html

A Connected Future: IoT for Health Care Providers

Have Your Cake and Eat It Too - With Python, R + SAS (recording) (slides)

You can use SAS in Python and use Python in SAS.

swat library

impute to fill in missing values.

Can 'promote' dataset to colleagues.

https://github.com/sassoftware/sas-prog-for-r-users ; https://github.com/sassoftware/saspy ; https://github.com/sassoftware/saspy-examples ; ... https://github.com/sassoftware

SAS automated ML pipeline

SAS Model Studio

Data -> Imputation -> SAS logistical regression

-> Python model

-> R model -> model comparison

Create 'New Pipeline' OpenSourceHMEQ template. Write R and Python code (sklearn ensembles random forrest) in SAS.

Can compare against other pipelines and see 'Gradient Boosting' is best. Register the model.

Register models -> compare models -> select champion -> validate champion -> deploy -> score new -> monitor -> retrain/new -> back to start.

SAS Model Manager

Lift? (model validate; https://en.wikipedia.org/wiki/Lift_(data_mining))

Python Flask application to make a binary decision based on the model e.g. can I get a loan.

support.sas.com/rusers

sas-viya-programming https://github.com/sassoftware/sas-viya-programming

Dirty Jobs with Dirty Data: Get Clean with Better Data Quality

Challenges: too much data; poor quality; multiple sources (inconsistent); inability to deliver data.

Best practices: Profile data; preparation; standardization; match identification (e.g. Sam vs Samuel); monitoring (e.g. anomalies); repeatable process and workflow.

SAS Data Quality

Can check for missing values; min and max; data/time issues. Pattern frequency distribution e.g. FL, F.L.

Build scheme. change all things that should be FL to FL e.g. F.L. Florida, florida.

Google Sites

Report abuse

SAS for R users

My learnings of SAS

Why should I learn SAS?

Can I learn SAS?

SAS

Introduction and Working is SAS

Working in SAS Studio

Writing a program

Using Tasks and Snippets in SAS Studio

Bayesian Logistic Regression

Poker Simulation

Multiple Linear Regression Power Analysis

Calling R from SAS

SAS libraries

Procedure syntax

SAS Documentation

SAS Training

Importing and Reporting Data

Creating datasets

Importing raw data files

Reporting data

Descriptive Procedures, Output Delivery System, and Macros

SAS Webinars

Enhancing AML (Anti-Money Laundering) Efficiency and Effectiveness: AI (Artificial Intelligence) Transforms the Rules of the Game

A Connected Future: IoT for Health Care Providers

Have Your Cake and Eat It Too - With Python, R + SAS (recording) (slides)

Dirty Jobs with Dirty Data: Get Clean with Better Data Quality

En hancing AML (Anti-Money Laundering) Efficiency and Effectiveness: AI (Artificial Intelligence) Transforms the Rules of the Game