Data Step

LIST INPUT

//manually type in data rows

//The INPUT statement specify the column headings. If the column contains characters, there should be a $ sign after the column name. Otherwise it assumes the column is numeric.

//The DATALINES statement starts the rows of data.

//There must be a simicolon at the end of each statement. The line of data row is not a statement.

//A missing value (null) is represented by a period charactor (.)

DATA TestDataTable;

INPUT Column1 Column2 $ Column3;

DATALINES;

1 Tom 1.3

2 Jan 1.5

3 Mar 1.6

4 . 1.7

;

RUN;

//read data from a file

//INFILE specifies the location of the file

DATA TestDataTable;

INFILE '/folders/myfolders/test.txt'

INPUT Column1 Column2 $ Column3;

RUN;

COLUMN INPUT

//Column input reads in data lines and columns by the given positions

//The number 1-2 means the Column1 data is from position 1 to position 2

//The fixed width column allows to leave blank for a missing value and characters with spaces between

DATA TestDataTable;

INFILE '/folders/myfolders/test.txt'

INPUT Column1 1-2 Column2 $ 3-10 Column3 11-12;

RUN;

COLUMN INPUT with data types

//the data type is called informat; it follows the syntax $informatw. for character, informatw.d for numeric number and informatw. for date.

//the w is the width of the field (# of columns/digits), the d is the number of decimal places, the dot is the dot.

//In the example, $char10. indicates that the first 10 digits of the row comprise of col1 and it's a character field.

//the ddmmyy10. indicates that it's a date field with DayMonthYear format. It converts the 12/12/2015 to the number of days from 1960-01-01 to 12/12/2015 so 20434 is stored instead of 12/12/2015. Gods know why SAS chooses 1960.

//the +1 means to skip one column (digit) in the row so it skips the space.

//the comma10.2 means it is a numeric field and uses comma with 2 decimal places.

data test;

input col1 $char10. col2 ddmmyy10. +1 col3 comma10.2;

datalines;

hello 12/12/2015 234,222.11

world 20-01-2016 12,000.38

;

proc print data=test;

run;

COLUMN POINTER

when reading a data line, you can specify where to start reading a field

e.g

input col1 $ @40 col2 $;

the @40 indicates to starting reading col2 from the 40th column.

similarly you can specify the keyword where to start reading a data field.

e.g

input col1 $ @'name: ' col2 $20.

the @'name: ' indicates to search for the next keywork 'name:' before reading the col2.

In this way the data lines can have various lengths but it also locates the name column by the keyword.

Page updated

Google Sites

Report abuse