SGPLOT

Scatter displays all the data points in a dataset.

here engine size is the x axis and horsepower is the y axis.

proc sgplot data=sashelp.cars;

 scatter x=enginesize y=horsepower;

run;

A Prediction Ellipse is a region for predicting a new observation in the population. It also approximates a region containing a specified percentage of the population. The ellipse indicates the correlation betwen

the two standardized variables.

The following draws all the data points and an ellipse that approximates the data region.

proc sgplot data=sashelp.cars;

 scatter x=enginesize y=horsepower;

 ellipse x=enginesize y=horsepower;

run;

Box Plot

proc sgplot data = sashelp.cars;

  vbox horsepower / category=make;

run;

The vertical box plot here describes the distribution of horsepower data by car make.

Each vertical line shows the Min, Max, Mean and Median values of the horsepower.

The First Quartile is the 25th percentile, which means 25% of the data values are lower than it.

The third Quartile is the 75th percentile, which means 75% of the data values are lower than it.

The range between the First Quartile and the Third Quartile is called IQR. Any data value that is 1.5IQR higher than the Third Quartile, or is 1.5IQR lower than the First Quartile is considered as Suspected outlier.

Series

simply specify the x and y axis for the series. 

here the x axis (year) is of discrete type. If not specify discrete type, will show 2010.5 year or sth like that.

Also can specify where clause in the data parameter.

proc sgplot data=sashelp.electric(where=(customer="Residential"));

  xaxis type=discrete;

  series x=year y=coal;

run;

Histogram and Probability Density

proc sgplot data=sashelp.cars;

histogram horsepower;

density horsepower;

run;

The default density here is assuming Gaussian (Normal) distribution.

density xxx / type=yyy can specify different ways to figure out the density function.

Bar chart

proc sgplot data=sashelp.cars;

vbar make / response=horsepower;

run;

Can use vbar or hbar. The response is the y axis.

Heatmap

data work.test;

  input yy xx value;

  datalines;

 1 1 100

 1 2 50

 2 1 10

 2 2 100

 2 2 1

  ;

run;

proc sgplot data = work.test;

 heatmap x =xx y= yy / COLORRESPONSE=value COLORSTAT=MEAN nxbins=2 nybins=2 showxbins showybins;

run;

COLORRESPONSE is the value to color the grid

COLORSTAT is the aggregation of the value, it could be FREQ, PCT, SUM or MEAN

nxbins is the number of bins on x axis

showxbins displays the bins

Use the combination of nybins and ybinstart and ybinsize:

Assuming the y axis has a range of 0-10 and we want to show 11 ticks, then sepecify 11 bins starting from 0 with each bin accounts for size 1

 nybins=11 ybinstart=0 ybinsize=1