temp

PROC SQL

Here all the details of PROC SQL have been covered within 5 topics.

  1. Working with Basic SQL commands – Create Table, Alter Table, Inserting data into table, Update table etc.

  2. Working with SQL Joins, Constraints, Indexes and Set Operators

  3. Working with PROC SQL The SAS Way – co-related and non co-related sub queries, case-when statement etc.

  4. Important SAS Options for PROC SQL

  5. Working with PROC SQL using external databases.

Working with Basic SQL commands – Create Table, Alter Table, Inserting data into table, Update table etc.

· Creating table with PROC SQL

A table can be created with 3 methods using PROC SQL.

Though according to syntax it is a table; however internally it is nothing but a SAS dataset itself.

Any table created using PROC SQL can be used with other data or PROC steps as a dataset outside the PROC SQL and any dataset created using data step or PROC step can be treated as table in PROC SQL.

Following is the demo of creating a table using PROC SQL;

    1. Creating table Using Create Table Statement.

SYNTAX:

PROC SQL;

CREATE TABLE tablename (columnname_1 DATATYPE <FORMAT formatname> <LABEL Labelname>, columnname _2 DATATYPE <FORMATformatname> <LABEL Labelname>, columnname_3 DATATYPE <FORMAT formatname> <LABEL Labelname>,

………………………………);

QUIT;

SAS CODE:

proc sql;

create table emp

(empID int, empname char(200), deptID int, dob num format date9.);

describe table emp;

quit;

OUTPUT:

Output of this query can be observed in the log as follows;

create table WORK.EMP( bufsize=24576 )

(

empID num,

empname char(200),

deptID num,

dob num format=DATE9.

);

There are few points to make a note

Describe command displays a table structure inside the log file.

Though SAS has only 2 data type (numeric and character) one can use SQL specific (ANSI standard) data types like (int, varchar etc.)

In Addition one can also use the formats and labels within the PROC SQL syntax which is generally not observed with SQL syntax.

    1. Creating table using Like Keyword from another table

SYNTAX:

PROC SQL;

CREATE TABLE tablename LIKE other_tablename;

QUIT;

SAS CODE:

proc sql;

insert into emp values (101, "Tommy Regan" 10, "10JUN1983"d);

insert into emp values (102, "Jin Andre" 11, "03JAN1970"d);

quit;

PROC SQL;

create table another_emp like emp;

describe table another_emp;

quit;

OUTPUT:

create table WORK.ANOTHER_EMP( bufsize=24576 )

(

empID num,

empname char(200),

deptID num,

dob num format=DATE9.

);

LIKE keyword creates a table just like another tables without the data. Means the newly created table will have only the metadata same as the original table but will not have any data records inside it though the original table might contain any number of records.

This can be confirmed with the following NOTE in the log.

NOTE: Table WORK.ANOTHER_EMP created, with 0 rows and 4 columns.

Though the original table emp has 2 records the newly created table has got only 0 records.

    1. Creating table using AS keyword from another table

SYNTAX:

PROC SQL;

CREATE TABLE tablename AS

SELECT other_tablename;

QUIT;

SAS CODE:

PROC SQL;

create table another_emp as

select * from emp;

describe table another_emp;

quit;

OUTPUT: Considering the original emp table has the same status as shown in the table; output of this query can be observed as follows;

create table WORK.ANOTHER_EMP( bufsize=24576 )

(

empID num,

empname char(200),

deptID num,

dob num format=DATE9.

);

In addition to this it’s also worth to make a note of the following line in the log.

NOTE: Table WORK.ANOTHER_EMP created, with 2 rows and 4 columns.

Unlike in the previous example we can see that the newly created table is clone on the original table with all the data records copied as it is.

· Altering table structure using ALTER TABLE command.

SAS CODE:

proc sql;

alter table emp add deptname varchar;

alter table emp add sal num;

alter table emp drop deptname;

describe table emp;

quit;

OUTPUT:

15 proc sql;

16 alter table emp add deptname varchar;

NOTE: Table WORK.EMP has been modified, with 5 columns.

17 alter table emp add sal num;

NOTE: Table WORK.EMP has been modified, with 6 columns.

18 alter table emp drop deptname;

NOTE: Table WORK.EMP has been modified, with 5 columns.

19 describe table emp;

NOTE: SQL table WORK.EMP was created like:

create table WORK.EMP( bufsize=24576 )

(

empID num,

empname char(200),

deptID num,

dob num format=DATE9.,

sal num

);

The output in the log is self explanatory.

One can note that one column has been added each time when the add command is used; while drop command drops the column from the table.

And finally the table structure with SAL column added to the table.

· Inserting records(rows) into table using INSERT query

SAS CODE:

/*

proc sql;

create table emp

(empID int, empname char(200), deptID int, dob num format date9.);

describe table emp;

insert into emp values (101, "Tommy Regan", 10, "10JUN1983"d);

insert into emp values (102, "Jin Andre", 11, "03JAN1970"d);

quit;

*/

proc sql;

insert into emp (empID, empname, deptID, sal)

values (103, "Rob Matthews",11,2000);

quit;

OUTPUT:

· There are multiple ways of using INSERT statement those are discussed below;

SAS CODE:

proc sql;

insert into emp

set empID = 104,

empname = "Jan Philip",

deptID = 10,

sal = 3000;

select * from emp;

quit;

OUTPUT:

We can observe that ‘emp’ table which was having 3 records previously has got one record added with empID = 104.

Records can also be inserted into one table directly from other table. Suppose there is another table called another_emp which looks like the following

Lets see, how one can insert these records from ‘another_emp’ into ‘emp’ .

SQL code for this is as follows;

SAS CODE:

proc sql;

insert into emp

select * from another_emp;

quit;

proc sql;

select * from emp;

quit;

OUTPUT:

One can observe the output that 2 more records with empID 105 and 106 have been added to ‘emp’ table with the same values as that of from the ‘another_emp’ table.

· Updating record values of a table using UPDATE query

Using UPDATE query one can change the existing values of the columns in the table.

Basic syntax is as follows;

SYNTAX:

UPDATE tablename SET columnname = new_value

However with this syntax new value will be assigned to each row of the table.

IF we want to change the value only for some selected columns we can use WHERE clause.

This has been shown with example below.

SAS CODE:

PROC SQL;

UPDATE emp SET sal = 3500 where sal eq .;

select * from emp;

quit;

OUTPUT:

One can observe in the output that sal value has been updated to 3500 for empID 101 and 102.

· Deleting records from a table using DELETE query

SYNTAX:

DELETE * FROM tablename;

However with this syntax all the records from the table will get deleted.

IF one wants to delete only selected records then he/she can use WHERE clause.

This has been shown with example below.

SAS CODE:

PROC SQL;

DELETE * FROM emp where empID in (105,106);

select * from emp;

quit;

OUTPUT:

One can observe that; records corresponding to empID 105 and 106 has been deleted from the table.

· Dropping the table

SYNTAX:

DROP TABLE tablename;

There is difference between deleting all the records form the table using DELETE command and dropping the table using DROP command.

In later case the table with all its records and metadata will get deleted.

As we are going to use the ‘emp’ table for further demonstration; to demonstrate this scenario we will temporarily create another table with the name ‘another_emp’ and will then drop that table.

This has been shown below.

SAS CODE:

PROC SQL;

CREATE TABLE another_emp as SELECT * FROM emp;

DROP TABLE another_emp;

quit;

OUTPUT:

The part of log has been given as output.

66

67 PROC SQL;

68 CREATE TABLE another_emp as SELECT * FROM emp;

NOTE: Table WORK.ANOTHER_EMP created, with 4 rows and 5 columns.

69 DROP TABLE another_emp;

NOTE: Table WORK.ANOTHER_EMP has been dropped.

70 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.15 seconds

user cpu time 0.00 seconds

system cpu time 0.00 seconds

Memory 7816k

OS Memory 15908k

Timestamp 4/30/2012 11:42:35 AM

Page Faults 0

Page Reclaims 20

Page Swaps 0

Voluntary Context Switches 20

Involuntary Context Switches 2

Block Input Operations 0

Block Output Operations 0

SQL JOINS, INTEGRITY CONSTRAINTS AND INDEXES

We will continue with the same example of ‘emp’ table from the last topic “Basic SQL Commands”. We are going to discuss following topics one by one.

1. Integrity Constraints

2. SQL Joins

3. SET Operators

4. Indexes

1. Column Integrity Constraints

Integrity constraints are broadly classified as General Integrity Constraints and Referential Integrity Constraints

General Integrity Constraints are further classified as;

a. Primary Key

b. Not NULL

c. UNIQUE

d. Check Constraint

While Referential Integrity Constraints are referred using foreign key.

We are going to start with General Integrity Constraints and then at the end we will study Referential Integrity Constraint with Foreign Key at the end of this section.

a. Primary Key Constraint - This is the value of the column from which

one can uniquely identify a record from the table.

Column on which PRIMIARY KEY constraint has been applied; exhibits two properties

i. UNIQUE – Means no duplicate values allowed.

ii. Not NULL - Null (Missing) values will not be allowed.

b. Not NULL Constraint – NULL or Missing values are not allowed.

c. UNIQUE Constraint – each value will be unique, duplicate values are not allowed.

d. Check Constraint – This is very useful constraint, which can be used to force some specific condition on the column values (like value greater than 100, date after JAN1980 etc.)

After having this familiarity with the types of constraints, we will see how to implement these constraints programmatically on the table columns.

In SAS there are basically 3 ways to implement column constraints.

1. Using PROC SQL

2. Using PROC DATASETS

3. Using DATA STEP (As dataset option)

Here we will limit our scope only to PROC SQL way.

Implementing column constraints using PROC SQL;

In the code snippet below; we will see how we can apply different types of integrity constraints on the columns of ‘emp’ table.

As ‘emp’ table already exists, one can use ‘ALTER’ command to enforce the constraints this has been demonstrated with the below example.

SAS CODE:

proc sql;

/* Adding constraints to columns by altering table */

alter table emp

add constraint

ID_PK Primary key (empID),

NOT_NULL_NM NOT NULL (empname),

CHK_SAL CHECK (sal > 0);

/* examining table constraints */

describe table constraints emp;

quit;

OUTPUT:

One can observe from the output that 3 constraints with the respective names have been applied on the corresponding columns of the ‘emp’ table. The same names of the constraints can referred for dropping or modifying the constraints.

Following SYNTAX can be used for dropping the constraints

SYNTAX:

ALTER TABLE tablename

DROP CONSTRAINT constraintname;

Let’s see what happens when we try to insert data when the columns have already got constraints

SAS CODE:

proc sql;

insert into emp

/* inserting all valid values according to constraint */

set empID = 105,

empname = "Harvy Smith",

deptID = 11,

sal = 3000;

quit;

/* inserting invalid values */

proc sql;

insert into emp

set empID = 105, /*duplicate Primary Key value */

empname = "XYZ XYZ",

deptID = 11,

sal = 3000;

quit;

PROC SQL;

insert into emp

set empID = 106,

empname = '', /* missing NOT NULL column */

deptID = 11,

sal = 3000;

quit;

PROC SQL;

insert into emp

set empID = 106,

empname = 'XYZ XYZ',

deptID = 11,

sal = 0; /* invalid value according CHECK constraint */

quit;

proc sql;

select * from emp;

quit;

OUTPUT:

We can observe from the output that; only one record with all valid values according to constraints has been inserted. All other values got rejected with the following reasons given in the lob.

Part of LOG:

24 /* inserting invalid values */

25

26 proc sql;

27 insert into emp

28 set empID = 105, /*duplicate Primary Key value */

29 empname = "XYZ XYZ",

30 deptID = 11,

31 sal = 3000;

ERROR: Add/Update failed for data set WORK.EMP because data value(s) do not comply with integrity constraint ID_PK.

NOTE: Deleting the successful inserts before error noted above to restore table to a consistent state.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

32 quit;

34 PROC SQL;

35 insert into emp

36 set empID = 106,

37 empname = '', /* missing NOT NULL column */

38 deptID = 11,

39 sal = 3000;

ERROR: Add/Update failed for data set WORK.EMP because data value(s) do not comply with integrity constraint NOT_NULL_NM.

NOTE: Deleting the successful inserts before error noted above to restore table to a consistent state.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

40 quit;

42 PROC SQL;

43 insert into emp

44 set empID = 106,

45 empname = 'XYZ XYZ',

46 deptID = 11,

47 sal = 0;

ERROR: Add/Update failed for data set WORK.EMP because data value(s) do not comply with integrity constraint CHK_SAL.

NOTE: Deleting the successful inserts before error noted above to restore table to a consistent state.

NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.

47 ! /* invalid value according CHECK constraint */

48 quit;

For demonstrating Referential Integrity Constraint we need to two tables; we have 'emp' table ready with us now lets create another table called 'dept' as follows;

SQL JOINS:

For demonstrating SQL joins in PROC SQL we need to at least two tables; we have 'emp' table ready with us now lets create another table called 'dept' as follows;

proc sql;

create table dept

(deptID int, deptname char(200));

describe table emp;

quit;