LIST INPUT
//manually type in data rows
//The INPUT statement specify the column headings. If the column contains characters, there should be a $ sign after the column name. Otherwise it assumes the column is numeric.
//The DATALINES statement starts the rows of data.
//There must be a simicolon at the end of each statement. The line of data row is not a statement.
//A missing value (null) is represented by a period charactor (.)
DATA TestDataTable;
INPUT Column1 Column2 $ Column3;
DATALINES;
1 Tom 1.3
2 Jan 1.5
3 Mar 1.6
4 . 1.7
;
RUN;
//read data from a file
//INFILE specifies the location of the file
DATA TestDataTable;
INFILE '/folders/myfolders/test.txt'
INPUT Column1 Column2 $ Column3;
RUN;
COLUMN INPUT
//Column input reads in data lines and columns by the given positions
//The number 1-2 means the Column1 data is from position 1 to position 2
//The fixed width column allows to leave blank for a missing value and characters with spaces between
DATA TestDataTable;
INFILE '/folders/myfolders/test.txt'
INPUT Column1 1-2 Column2 $ 3-10 Column3 11-12;
RUN;
COLUMN INPUT with data types
//the data type is called informat; it follows the syntax $informatw. for character, informatw.d for numeric number and informatw. for date.
//the w is the width of the field (# of columns/digits), the d is the number of decimal places, the dot is the dot.
//In the example, $char10. indicates that the first 10 digits of the row comprise of col1 and it's a character field.
//the ddmmyy10. indicates that it's a date field with DayMonthYear format. It converts the 12/12/2015 to the number of days from 1960-01-01 to 12/12/2015 so 20434 is stored instead of 12/12/2015. Gods know why SAS chooses 1960.
//the +1 means to skip one column (digit) in the row so it skips the space.
//the comma10.2 means it is a numeric field and uses comma with 2 decimal places.
data test;
input col1 $char10. col2 ddmmyy10. +1 col3 comma10.2;
datalines;
hello 12/12/2015 234,222.11
world 20-01-2016 12,000.38
;
proc print data=test;
run;
COLUMN POINTER
when reading a data line, you can specify where to start reading a field
e.g
input col1 $ @40 col2 $;
the @40 indicates to starting reading col2 from the 40th column.
similarly you can specify the keyword where to start reading a data field.
e.g
input col1 $ @'name: ' col2 $20.
the @'name: ' indicates to search for the next keywork 'name:' before reading the col2.
In this way the data lines can have various lengths but it also locates the name column by the keyword.