SAS for R users

September 2018

My learnings of SAS

Why should I learn SAS?

SAS is a legacy software in many industries (40+ years!). It's fairly easy for newbies to pick up with its point and click and has a 'business analytics' side making it more attractive to industry. It can integrate fairly well with databases as well.

Can I learn SAS?

SAS is proprietary but you can get a university version which i've used for this. However, it's not really built to work on a mac.

SAS

I will use up the some of my LinkedIn Premium features and take a SAS course on there https://www.linkedin.com/learning/sas-programming-for-r-users-part-1/introduction-to-sas-and-sas-studio. I will skip over the R stuff and take note of the SAS stuff as it not easy to call R with SAS on my mac (http://support.sas.com/documentation/cdl/en/imlug/64248/HTML/default/viewer.htm#imlug_r_sect003.htm)

Base SAS - Built in functions

SAS/ACCESS - Reading in data

SAS/STAT - Analytic models

SAS/IML - Interactive matrix language

ETS license - Time series, forecasting


SAS Studio has different windows:

  • code editor/work area
  • navigation pane

Store data in Libraries permanently. WORK library is temporary.

Introduction and Working is SAS

Working in SAS Studio

Click on Libraries -> SASHELP -> CARS (double click) opens a new Table in a new tab. Blue arrow goes to 'next page'34.

Change columns shown by using the tick boxes.

Right click column header to sort data by that column e.g. ascending or filtered (e.g. Invoice >= 30000) . To remove the filter click on the 'x' in the View tab. To view the code for this action click on 'Display the query that creates the current table'. There is three procedures (as SQL procedure, a datasets procedure and print procedure):

PROC SQL;
CREATE TABLE WORK.query AS
SELECT Make , DriveTrain , Invoice , Cylinders FROM SASHELP.CARS WHERE Invoice>=30000;
RUN;
QUIT;

PROC DATASETS NOLIST NODETAILS;
CONTENTS DATA=WORK.query OUT=WORK.details;
RUN;

PROC PRINT DATA=WORK.details;
RUN;

Writing a program

Click on 'New Options' (seven dots button in top right). You apply a procedure to a data table. Type PROC PRINT and it will pop the help text box if you don't want this you can remove it by clicking the 'More application options' (three horizontal lines button in top right) -> Preferences -> Editor -> Click off 'Enable autocomplete' -> Save. You can also right click a keywork e.g. Print and click 'Syntax Help'. Click the running man button to run. Check the LOG tab for error outputs. Click on the Note and it'll take you to that line. You can click on 'open in a new browser tab' to see the table more clearly.

Every statement must end with a ;

* Print the CARS data table;
*PROC PRINT DATA=SASHELP.CARS;
*RUN;
* Prints the entire data table;

/*
 * Print certain columns in the CARS data table.
 */
PROC PRINT DATA=SASHELP.CARS;
   * VAR is additional arguments to the PROC;
   * You can choose the columns by clicking on libraries -> CARS;
   * Hold control on the keyboard and click the columns of interest;
   * Drag and drop to after VAR;
   * Remove the 'SASHELP.CARS';
   VAR Make Model MPG_City;
RUN;

Using Tasks and Snippets in SAS Studio

Tasks = Point and click features which generate code behind the scene

Click 'Tasks and Utilities' -> 'Tasks' -> 'Statistics' -> 'Summary Statistics' -> 'Select a table' -> add 'Weight' to Analysis variables and this will generate some code. -> 'options' -> 'PLOTS' -> 'Histogram' and 'Add normal density curve' -> 'Run'.

ods noproctitle;
ods graphics / imagemap=on;

proc means data=SASHELP.CARS chartype mean std min max n vardef=df;
  var Weight;
run;

proc univariate data=SASHELP.CARS vardef=df noprint;
 var Weight;
 histogram Weight / normal(noprint);
run;

Click on 'Snippets' = starter codes -> 'Snippets' -> 'Graph' -> 'Scatter Plot Matrix' -> 'Run'

ods noproctitle;
ods graphics / imagemap=on;

proc means data=SASHELP.CARS chartype mean std min max n vardef=df;
  var Weight;
run;

proc univariate data=SASHELP.CARS vardef=df noprint;
 var Weight;
 histogram Weight / normal(noprint);
run;

Click on 'New Snippet' and copy above 'proc print' code -> 'Save' as 'Print Variables'. Will save under 'My Snippets'. Can then drag and drop it into code.

Bayesian Logistic Regression

Logistic Regression = Regression on data when the dependent variable is binary (e.g. True, False)

Bayesian = Update statistical inference (e.g. what kind of distribution) as more data becomes available. Here is a nice plot showing the distribution being updated.

mcmc = Markov Chain Monte Carlo. A Markov Chain is a stochastic model which the probability of each event depends on the state attained in the previous event (here is an example). Markov Chain Monte Carlo is convergence of a probability distribution given a number of samples. You can read more about in the SAS documentation

The code for this is HERE

Data step is for reading in data, altering data, subsetting data.

You can highlight code just to run that part.

Poker Simulation

IML = Interactive Matrix Language

Texas Hold 'em:

  • 9 players at a table
  • Each player: 2 cards face-down
  • Dealer: 5 cards face-up

The code for this is HERE

Multiple Linear Regression Power Analysis

The code for this is HERE

Calling R from SAS

The code for this is HERE

SAS libraries

Work library is temporary. Sashelp has sample datasets. Sasuser save datasets you work with often.

Define your own library as:

libname SP4R "s:\workshop"

Add data sets in your library and call as spr4.frog.

Procedure syntax

In PROC Step: STATEMENT ... <option>;

Can save data using outpost= in PROC mcmc for example.

Click on the product e.g. SAS Analytical Products 14.1 -> What's new in SAS/STAT -> Contents -> Procedures -> The MCMC or Topics -> Bayesian analysis -> MCMC and you can see the statements. Click on 'PROPDIST' and click on Metropolis and Metropolis-Hastings Algorithms to find more about the method. Click the examples tab and click on Logistic Regression Model with a Diffuse Prior then copy and paste the example.

Free tutorials

Importing and Reporting Data

Creating datasets

The code for this is HERE

* Save as example_data in sp4r library;
* Specific length of 25 for characters;
data sp4r.example_data;
   length First_Name $ 25 Last_Name $ 25;
   input First_Name $ Last_Name $ age height;
   datalines;
   Jordan Bakerman 27 68
   Bruce Wayne 35 70
   Walter White 51 70
   Henry Hill 65 66
   JeanClaude VanDamme 55 69
;
run;

* @@ is trailing hold the line;
data sp4r.example_data2;
length First_Name $ 25 Last_Name $ 25;
input First_Name $ Last_Name $ age height @@;
datalines;
Jordan Bakerman 27 68 Bruce Wayne 35 70 Walter White 51 70
Henry Hill 65 66 JeanClaude VanDamme 55 69
;
run;

Importing raw data files

Can use DATA step as above or use PROC IMPORT (.csv, .xlsx)

The code for this is HERE and data is HERE

* Import data using DATA step;
data sp4r.all_names;
   length First_Name $ 25 Last_Name $ 25;
   infile "&path\allnames.csv" dlm=',';
   input First_Name $ Last_Name $ age height;
run;

* Import data using PROC import;
* REPLACE will overwrite sp4r.baseball;
* getnames=yes will use header;
* data starts on second row;
proc import out=sp4r.baseball 
   datafile= "&path\baseball.csv" DBMS=CSV REPLACE;
   getnames=yes;
   datarow=2; 
run;
* Rename the variables;
/*Rename the variables*/
data sp4r.baseball;
   set sp4r.baseball;
   rename nAtBat = At_Bats
      nHits = Hits
      nHome = Home_Runs
      nRuns = Runs
      nRBI = RBIs
      nError = Errors;
run;

Reporting data

See data type and length:

proc contents data=sp4r.cars varnum;
run;

See the data head(1:6):

proc print data=sp4r.cars (firstobs=1 obs=6);
run;

Print unique level

proc sql;
   SELECT UNIQUE origin FROM sp4r.cars;
quit;

Use upcase to print conditionally if you don't know the case of the variable name.

proc print data=sp4r.cars;
   var gender
   where upcase(gender)='MALE'; * It is actually called 'Male'
run;
* ^= is not equal. = is equal to. Can also use NE, GE
* IN is it equal to one of a list. where country in ('US','CA'); where country is USA or Canada.
* &, | = AND, OR. ^=NOT. where country not in ('US','CA');

Change column labels and data format. The code for this is HERE

* Change FN column to first name;
proc print data=sp4r.business label;
   label FN='First Name'
run;

* Change data format;
* $ = character;
*DOLLAR12.2 - convert to dollar with 12 characters and 2 d.p;
*MMDDYY10. Convert SAS numeric day to date
proc print data=sp4r.business;
   format salary dollar12.2 hire_date mmddyy10.;
run;

* can create on format or look at docs online 
proc format;
   value $jobformat 'SR'='Sales Rep'
                    'SM'='Sales Manager';
   value bonusformat 0='No' 1='Yes';
run;
proc print data=sp4r.business;
   format job $jobformat. bonus bonusformat.;
run;

data employees;
   input name $ bday :mmddyy8. @@;
   datalines;
   Jill 01011960 Jack 05111988 Joe 08221975
   ;
run;

proc print data=employees;
run;

* Now use label and format;
data employees;
   input name $ bday :mmddyy8. @@;
   format bday mmddyy10.;
   label name="First Name" bday="Birthday";
   datalines;
   Jill 01011960 Jack 05111988 Joe 08221975
   ;
run;

proc print data=employees label;
run;

Create variables and new data

* Add two columns;
data sp4r.cars;
   set sp4r.cars;
   wheelbase_plus_length = wheelbase+length;
run;

* Change values conditionally;
data sp4r.cars;
   set sp4r.cars;
   if mpg<20 then bonus=0;
   else if mpg_highway<30 then bonus=1000;
   else bonus=2000
run;

* Create new character variable
data spr4.cars;
   set sp4r.cars
   length type2 $ 25;
   if type in ('Hybrid','SUV')
      then type2='Family Vehicle';
   else type2='Truck or Sports Vehicle';
run;

* DO group if need to create more than one variable
data sp4r.cars;
   set sp4r.cars;
   length frequency $ 12;
   if mpg_highway<20 then do;
      bonus=0;
      frequency='No Payment';
   end;
   else if mpg_highway<30 then do;
      bonus=1000;
      frequency='One Payment';
   end;
   else do;
      bonus=1000;
      frequency='Two Payments';
   end;
run;

Create and use functions

data sp4r.cars;
   set spr4.cars;
   log_price = log(msrp);
run;

* Mean across rows
data sp4r.cars;
   set spr4.cars;
   mean_mpg = mean(mpg_highway,mpg_city);
run;

* _NULL_ = don't edit values
data _NULL_;
   a=mean(1,2,3,4,5);
   b=exp(3);
   c=var(10,20,30);
   d=poisson(1,2);
   put a b c d; * to Log;
run;

* String functions e.g. SUBSTR, SCAN.
newstr = substr(str,length(str),1)
newstr = scan(str,2,',') ; second word and ,
concatstr = catx(' ',str1,str2);
newvar = transwrd(var,'str ','newstr ') * replace str with newstr in column

Functions can be found at SAS 9.4 -> documentation by title -> functions and CALL routines -> dictionary of functions and CALL routines. e.g. look at example in FIND

Create functions in functions compiler procedure (return single values)

* Switch order of string
proc fcmp outlib=sp4r.functions.newfuncs;
   function ReverseName(name $) $;
   length newname $ 40;
   newname=catx(' ',scan(name,2,','),scan(name,1,','));
   return(newname);
   endsub;
quit;

options cmplib=sp4r.functions
data sp4r.school;
   set sp4r.school;
   FLName=ReverseName(name);
run;

Subset data

* Keep some variables (columns)
data sp4r.cars2 (keep=make msrp invoice); * can also use drop
   set sp4r.cars;
run;

* subset by row [25:50]
data sp4r.cars2;
   set sp4r.cars (firstobs=25 obs=50);
run;

* subset conditionally
data sp4r.cars2;
   set sp4r.cars;
   where mpg_city > 35;
run;

* Create table
proc sql;
   create table sp4r.origin as
   SELECT UNIQUE origin FROM sp4r.cars;
quit;

Concat data

* Combine rows using SET
data a_all;
   SET a1 a2;
run;

* Combine columns
data b_all;
   SET b_col1;
   SET b_col2;
run;

* use merge to concat data tables with different dimension (cbind)
data c_all;
   merge c_small_col c_long_col;
run;

* can do merge according to a common variable (similar to SQL join)
* sort data first using PROC SORT

https://www.linkedin.com/learning/sas-programming-for-r-users-part-2/introduction-to-sas-and-sas-studio

DO loop

do i=2 to 10 by 2;
*do i=10 to 2 by -2;
end;

data loop;
*data loop (keep=x rep);
*data loop (drop=i);
   do i=2 to 10 by 2;
      x = i+1;
      rep = 1;
      output; * save all 
   end;
run;

* Iterate over values in data (similar to enumerate) to append a coloumn
data doloop;
   do i=1 to 2;
      output;
   end;
run;
data doloop;
   set doloop;
   do j=1 to 2;
      output;
   end;
run;

Generate random numbers

RAND('Normal',mean,std)

Do loop and random number generator. Code HERE

/*Part A*/
data sp4r.random (drop=i);
   call streaminit(123);
   do i=1 to 10;
      rnorm = rand('Normal',20,5);
      rbinom = rand('Binomial',.25,1);
      runif = rand('Uniform')*10;
      rexp = rand('Exponential')*5;
      output;
   end;
run;
proc print data=sp4r.random;
run;

/*Part B*/
data sp4r.random;
   call streaminit(123);
   set sp4r.random;
   rgeom = rand('Geometric',.1);
run;
proc print data=sp4r.random;
run;

/*Part C*/
data sp4r.doloop (drop=j);
   call streaminit(123);
   do group=1 to 5;
      do j=1 to 3;
         rpois = rand('Poisson',25);
         rbeta = rand('Beta',.5,.5);
         seq+1;
         output;
      end;
   end;
run;
proc print data=sp4r.doloop;
run;

/*Part D*/
data sp4r.quants;
do q=-3 to 3 by .5;
   pdf = pdf('Normal',q,0,1);
   cdf = cdf('Normal',q,0,1);
   quantile = quantile('Normal',cdf,0,1);
   output;
end;
run;

proc print data=sp4r.quants;
run;

~R plots using PROC SGPLOT (statistical graphics plot). Code HERE

proc sgplot data=sales;
   scatter x=month y=revenue;
   scatter x=month y=revenue_2;
   *series x=month y=revenue;
   *series x=month y=revenue_2;
run;

proc sgplot data=sales;
   scatter x=month y=revenue / group=company;
run;
proc sgplot data=sales;
   scatter x=month y=revenue;
   by company;
run;

/*Part A*/
data sp4r.hist_data;
   call streaminit(123);
   do i=1 to 1000;
      x = rand('exponential')*10;
      output;
   end;
run;
proc sgplot data=sp4r.hist_data;
   histogram x;
run;

proc sgplot data=sp4r.hist_data;
   histogram x / binwidth=1;
   density x / type=normal;
   density x / type=kernel;
run;

/*Part B*/
data sp4r.boxplot_data (drop=rep);
   call streaminit(123);
   do group=1 to 3;
      do rep=1 to 100;
         response = rand('exponential')*10;
         output;
      end;
   end;
run;
proc sgplot data=sp4r.boxplot_data;
    hbox response;
run;
proc sgplot data=sp4r.boxplot_data;
    hbox response / category=group;
run;

/*Part C*/
data sp4r.sales;
   call streaminit(123);
   do month=1 to 12;
      revenue = rand('Normal',10000,5000);
      output;
   end;
run;
proc sgplot data=sp4r.sales;
   vbar month / response=revenue;
run;

/*Part D*/
data sp4r.series_data (keep=x y1 y2);
   call streaminit(123);
   do x=1 to 30;
      beta01 = 10;
      beta11 = 1;
      y1 = beta01 + beta11*x + rand('Normal',0,5);
      beta02 = 35;
      beta12 = .5;
      y2 = beta02 + beta12*x + rand('Normal',0,5);
      output;
   end;
run;
proc sgplot data=sp4r.series_data;
   scatter x=x y=y1;
   scatter x=x y=y2;
run;
proc sgplot data=sp4r.series_data;
   series x=x y=y1;
   series x=x y=y2;
run;
proc sgplot data=sp4r.series_data;
   series x=x y=y1;
   scatter x=x y=y1;
   series x=x y=y2;
   scatter x=x y=y2;
run;

/*Part E*/
* regression, confidence limits and prediction limits
proc sgplot data=sp4r.series_data;
   reg x=x y=y1 / clm cli;
   reg x=x y=y2 / clm cli;
run;

Enhancing the plot. Can save as a pdf. Code HERE

/*Part A*/
data sp4r.sales;
   call streaminit(123);
   do month=1 to 12;
      revenue = rand('Normal',10000,1000);
      revenue_2 = rand('Normal',13000,500);
      output;
   end;
run;

/*Part B*/
proc sgplot data=sp4r.sales;
   series x=month y=revenue / legendlabel='Company A'
      lineattrs=(color=blue pattern=dash);
   series x=month y=revenue_2 / legendlabel='Company B'
      lineattrs=(color=red pattern=dash);

   title 'Monthly Sales of Company A and B for 2015';
   xaxis label="Month" values=(1 to 12 by 1);
   yaxis label="Revenue for 2015";
   inset "Jordan Bakerman" / position=bottomright;
   refline 6.5 / transparency= 0.5 axis=x;
   refline 11000 / transparency= 0.5;
run;
title;

/*Part C*/
proc sgplot data=sp4r.sales;
   series x=month y=revenue / legendlabel='Company A' name='Company A'
      lineattrs=(color=blue pattern=dash);
   scatter x=month y=revenue / markerattrs=(color=blue
      symbol=circlefilled);
   series x=month y=revenue_2 / legendlabel='Company B' 
      name='Company B' lineattrs=(color=red pattern=dash);
   scatter x=month y=revenue_2 / markerattrs=(color=red 
      symbol=circlefilled);

   title 'Monthly Sales of Company A and B for 2015';
   xaxis label="Month" values=(1 to 12 by 1);
   yaxis label="Revenue for 2015" min=8000 max=14000;
   inset "Jordan Bakerman" / position=bottomright;
   refline 11000 / transparency= 0.5;
   refline 6.5 / transparency= 0.5 axis=x;
   keylegend 'Company A' 'Company B';
run;
title;

Create faceted plots PROC SCSCATTER (matrix, plot, compare). code HERE

proc sgscatter data=sp4r.cars;
   plot mgg_cars*weight mpg_city*length
      weight*length / columns=3;
run;

* multi-cell plot
ods layout start rows=1 columns=3;
ods region row=1 column=3;
proc sgplot data=sp4r.cars;
   hbox mpg_city;
run;
ods layout end;

proc sgpanel data=sp4r.cars;
   panelby origin / columns=3;
   histogram mpg_city;
run;
proc sgpanel data=sp4r.lesscars;
   panelby origin type / rows=1 columns=3;
   reg x=weight y=mpg_city;
run;

/*Part A*/
data sp4r.multi;
   call streaminit(123);
   do Sex='F', 'M';
      do j=1 to 1000;
         if sex='F' then height = rand('Normal',66,2);
         else height = rand('Normal',72,2);
         output;
      end;
   end;
run;

/*Part B*/
proc sgpanel data=sp4r.multi;
   panelby sex;
   histogram height;
   density height / type=normal;
   title 'Heights of Males and Females';
   colaxis label='Height';
run;
title;

/*Part C*/
ods layout Start rows=1 columns=3 row_height=(1in) column_gutter=0;

ods region row=1 column=1;
proc sgplot data=sp4r.multi (where= (sex='F'));
   histogram height / binwidth=.5;
   title 'Histogram of Female Heights';
run;
title;

ods region row=1 column=2;
proc sgplot data=sp4r.multi (where= (sex='F'));
   density height / type=kernel;
   title 'Density Estimate of Female Heights';
run;
title;

ods region row=1 column=3;
proc sgplot data=sp4r.multi (where= (sex='F'));
   hbox height;
   title 'Boxplot of Female Hieghts';
run;
title;

ods layout end;

Descriptive Procedures, Output Delivery System, and Macros

CORR, FREQ, MEANS, UNIVARITE Procedures on varaible (column)

* Correlation matrix and covariance matrix
proc corr data=sp4r.cars cov;
   var horsepower weight length;
run;

* categorical data
proc freq data=sp4r.cars;
   tables origin type;
run;
proc freq data=sp4r.cars;
   tables origin*type; * cross table
   *tables origin*type / norow nocol nopercent;
run;
proc freq data=sp4r.cars nlevels;
   tables origin*type; * cross table
   *tables origin*type / noprint;
run;

MEANS gives stats summary and UNIVARTE. e.g. SWEWNESS, P10. code HERE

proc means data=spr4.cars maxdec=2 * mean median var;
   var mpg_city mpg_highway;
run;

HISTOGRAM
QQPLOT

/*Part A*/
proc contents data=sp4r.ameshousing varnum;
run;

/*Part B*/
proc univariate data=sp4r.ameshousing;
   var saleprice;
   histogram saleprice / normal kernel;
   inset n mean std / position=ne;
   qqplot saleprice / normal(mu=est sigma=est);
run;

Output Delivery System (ODS) (makes the tables). Code HERE

ods trace on;
proc univariate data=sp4r.ameshousing;
   var saleprice;
   qqplot saleprice / normall(me=est sigma=est)
run;
ods trace off;

ods select basicmeasures qqplot;
proc univariate data=sp4r.ameshousing;
   var saleprice;
   qqplot saleprice / normal(mu=est sigma=est);
run;

Save as a new SAS data set in PROC Step. code HERE

ods output basicmeasures = SP_BasicMeasures; * object = data-set-name
proc univariate data=sp4r.ameshousing;
   var saleprice;
run;

* To save a value
proc univariate data=sp4r.ameshousing;
   var saleprice;
   output out=stats mean=sp_mean;
run;

/*Part A*/
ods select basicmeasures;
ods output basicmeasures = sp4r.SalePrice_BasicMeasures;
proc univariate data=sp4r.ameshousing;
   var saleprice;
run;
proc print data=sp4r.saleprice_basicmeasures;
run;

/*Part B*/
proc univariate data=sp4r.ameshousing;
   var saleprice;
   * choose percentile points
   output out=sp4r.stats mean=saleprice_mean pctlpts= 40, 45, 50, 55, 60 
      pctlpre=saleprice_;
run;
proc print data=sp4r.stats;
run;

/*Part C*/
proc means data=sp4r.ameshousing;
   var saleprice garage_area;
   output out=sp4r.stats mean(saleprice)=sp_mean median(garage_area)=ga_med;
run;

proc print data=sp4r.stats;
run;

/*Part D*/
proc means data=sp4r.ameshousing;
   var saleprice garage_area;
   output out=sp4r.stats mean= std= / autoname;
run;
proc print data=sp4r.stats;
run;

Global macro variables (use variables in other datasets)

%let height = 67;
%len name = Ray Bell;
&height ;to use

* value
%let year = 2010;
proc print data=sp4r.ameshousing;
   where yr_sold = &year;
   var yr_sold saleprice;
   title "Price of Homes Sold in &year"
run;

* str
%let gtype = Attached;
proc print data=sp4r.ameshousing;
   where g = "&gtype";
   var yr_sold saleprice;
   title "a &gtype"
run;

Automating creating global macro variables in PROC SQL. Code HERE

proc means data=sp4r.ameshousing;
   var saleprice;
   output out=stats mean=mean std=sd; * out put mean and sd
run;
proc sql;
   select mean into :sp_mean from stats;
   select sd into :sp_sd from stats;
quit;
%put The mean and sd are &sp_mean ... * to write to log

%put _USER_

/*Part A*/
proc means data=sp4r.ameshousing;
   var saleprice;
   output out=sp4r.stats mean=sp_mean std=sp_sd;
run;
proc sql;
   select sp_mean into :sp_mean from sp4r.stats;
   select sp_sd into :sp_sd from sp4r.stats;
quit;

/*Part B*/
data sp4r.ameshousing;
   set sp4r.ameshousing;
   sp_stan = (saleprice - &sp_mean) / &sp_sd;
run;
proc print data=sp4r.ameshousing (obs=6);
   var saleprice sp_stan;
run;
proc means data=sp4r.ameshousing mean std;
   var saleprice sp_stan;
run;

/*Part C*/
proc contents data=sp4r.cars varnum out=carscontents;
run;
proc print data=carscontents;
   var name type;
run;

/*Part D*/
proc sql;
   select distinct name into: vars_cont separated by ' ' from carscontents where type=1;
   select distinct name into: vars_cat separated by ' ' from carscontents where type=2;
quit;
%put The continuous variables are &vars_cont and the categorical variables are &vars_cat;

Macro programs = R Function. code HERE?

%macro today
   %out Today is $sysday $sysdate9;
%mend;
%today

%macro calc(dsn,vars);
   proc means data=&dsn;
      var &vars;
   run;
%mean calc;
%calc(business,yield)

Keyword parameters (like python)
start=01jan08

Business example
Daily report
Weekly report every Friday

/*Part A*/
%macro mymac(dist,param1,param2=,n=100,stats=no,plot=no);

/*Part B*/
%if &dist= %then %do;
   %put Dist is a required argument;
   %return;
%end;

%if &param1= %then %do;
   %put Param1 is a required argument;
   %return;
%end;

/*Part C*/
%if &param2= %then %do;
   data random (drop=i);
      do i=1 to &n;
         y=rand("&dist",&param1);
         x+1;
         output;
      end;
   run;
%end;
%else %do;
   data random (drop=i);
      do i=1 to &n;
         y=rand("&dist",&param1,&param2);
         x+1;
         output;
      end;
   run;
%end;

/*Part D*/
%if %upcase(&stats)=YES %then %do;
   proc means data=random mean std;
      var y;
   run;
%end;

/*Part E*/
%if %upcase(&plot)=YES %then %do;
   proc sgplot data=random;
      histogram y / binwidth=1;
      density y / type=kernel;
   run;
%end;
%mend;

/*Part F*/
%mymac(param1=0.2,stats=yes)

/*Part G*/
%mymac(dist=Geometric,param1=0.2,param2=,stats=yes)

/*Part H*/
options mprint;
%mymac(dist=Normal,param1=100,param2=10,n=1000,plot=yes)

Macro program for iterative processing. Code HERE

%macro myappend(start,stop);
   %do year=&start %to &stop;
      proc import datafile="&path\sales_&year..csv" out=sp4r.sales_&year dbms=csv replace;
      run;

      proc append base=sp4r.sales_all data=sp4r.sales_&year;
      run;

      proc datasets library=sp4r noprint;
         delete sales_&year;
      quit;
   %end;
%mend;
options mprint;
%myappend(2000,2009)

/*Why did we use a double period to specify the DATAFILE above?*/
%let mypath = s:workshop\;
%put &mypathmydata.csv;
%put &mypath.mydata.csv;

%let mydata = sales_data;
%put &mydata.csv;
%put &mydata..csv;


SAS Webinars

My notes on SAS Webinars I listened to:

AI = Training computers to perform tasks to mimic human reasoning.

Machine Learning = Subset of AI to automatically learn and improve from experience without being explicitly programmed.

Applications:

  • Graph analytics
  • Correlation, Regression analysis
  • Cluster Analysis (groups, spot outliners)
  • Neural Network, Predictive Analytics (complex and unknown patterns)

Robotic Process Automation = Software to mimic human action by automating simply and repetitive talks. e.g. chat bot.

SAS Adaptive Learning and Intelligent Agent System.

https://www.sas.com/en_us/software/anti-money-laundering.html



You can use SAS in Python and use Python in SAS.

swat library

impute to fill in missing values.

Can 'promote' dataset to colleagues.

https://github.com/sassoftware/sas-prog-for-r-users ; https://github.com/sassoftware/saspy ; https://github.com/sassoftware/saspy-examples ; ... https://github.com/sassoftware

SAS automated ML pipeline

SAS Model Studio

Data -> Imputation -> SAS logistical regression

-> Python model

-> R model -> model comparison

Create 'New Pipeline' OpenSourceHMEQ template. Write R and Python code (sklearn ensembles random forrest) in SAS.

Can compare against other pipelines and see 'Gradient Boosting' is best. Register the model.

Register models -> compare models -> select champion -> validate champion -> deploy -> score new -> monitor -> retrain/new -> back to start.

SAS Model Manager

Lift? (model validate; https://en.wikipedia.org/wiki/Lift_(data_mining))

Python Flask application to make a binary decision based on the model e.g. can I get a loan.

support.sas.com/rusers

sas-viya-programming https://github.com/sassoftware/sas-viya-programming


Challenges: too much data; poor quality; multiple sources (inconsistent); inability to deliver data.

Best practices: Profile data; preparation; standardization; match identification (e.g. Sam vs Samuel); monitoring (e.g. anomalies); repeatable process and workflow.

SAS Data Quality

Can check for missing values; min and max; data/time issues. Pattern frequency distribution e.g. FL, F.L.

Build scheme. change all things that should be FL to FL e.g. F.L. Florida, florida.