首頁‎ > ‎EPBI414 Statistical Computing‎ > ‎SAS‎ > ‎

If then, else

Conditional processing using IFConditional processing using IF

For use in DATA steps only:
  • IF
  • IF-THEN
  • IF-THEN-ELSE
These may follow any of the above:
  • ELSE IF
  • ELSE IF-THEN
  • ELSE IF-THEN ELSE…
For use in DATA steps only:
  1. To subset data 
    1. Send only those observations from the source data set (SET) to the destination data set (DATA) where certain conditions are true. 
  2. Conditional programming
    1. Creating new variables,
    2. changing variable values,
    3. or performing operations
    4.   … in an observation based on certain conditions (variable values) in the observation.

Subsetting/Limiting Data- IF

IF statement – used in DATA steps, generally.
  • allows you to specify which observations will be read into SAS for processing during the DATA step or which records will be included in an analysis
  • In a DATA step:
    • IF = expression;
    •  DATA tempset;
        SET origset;
        IF varq = x;
        RUN;
  • Same as before…Dataset tempset  will be created using origset  as its base dataset.  However, only those observations where varq = x  will be executed beyond the ‘IF’ statement and therefore, output to Dataset tempset . 
  • Observations where  varq ^=x  will not be executed in the datastep and will not be included in the final Dataset tempset .
  • IF as a subsetting command

    DATA WORK.cardealers;

     SET mastercardata;
    IF CARS=‘PONTIAC’ THEN OUTPUT;
    *this does the same thing as... ;
    IF CARS=‘PONTIAC’;
    RUN;

    Result: All observations where CARS=‘PONTIAC’ is true are sent into the destination data set, Work.cardealers

    IF vs. WHERE when subsetting

    Using IF:
    Using WHERE
    When subsetting on variables newly created in the DATA set.
    In DATA command line (as option).
    Automatic (internal) variables (_N_, FIRST. , LAST., …)
    Use of LIKE or CONTAINS operator
    Using INPUT statement with raw data
    OK for use in SAS PROCedures
    When using conditional statement (IF-THEN…) In combination with DATA set options (OBS=, POINT=, FIRSTOBS=)
    Subsetting using PROC SQL
    May be more efficient
     When subsetting before merging
    When subsetting after merging

    IF-THEN for conditional programming 

    DATA cardealers;

     SET mastercardata;
    IF CARS=‘PONTIAC’ THEN BRAND=‘GM’;
      RUN;
    Result: In those observations where CARS=‘PONTIAC’ is true, BRAND takes the value, ‘GM’ in WORK.cardealers
     
    DATA cardealers;
     SET mastercardata;
    IF CARS=‘PONTIAC’ THEN BRAND=‘GM’;
      LABEL BRAND = ‘Brand of car’;
      RUN;
    1. Result: In those observations where CARS=‘PONTIAC’ is true, BRAND takes the value, ‘GM’ in WORK.cardealers
    2. BRAND may be newly generated if it did not exist in WORK.mastercardata
    WORK.cardealers
    ID        CARS          LOCATION            Brand of Car
    001    PONTIAC   DETROIT               GM
    002    PONTIAC   WAYNE                 GM
    003    CIVIC          MARYSVILLE  
    004    CRX              MARYSVILLE 
    005    ESCAPE      DEARBORN 
    006    PONTIAC   DETROIT               GM
    007    ACCORD     MARYSVILLE
    008    COROLLA  FREEMONT
    009    626               FLATROCK
    010    JETTA          BAVARIA
     

    Subsetting IF

    Equality and inequality signs can be used in conditional IF statements for subsetting.
    General form:
    LIBNAME libfile statement;
    DATA newdata;
      SET olddata;
    ****** NEW STATEMENTS FOLLOW******;
     IF  var = value or another variable;
       RUN;
    The equals sign can be replaced by any inequality sign or expression!
     

    IF – THEN: Conditional statement that create new variables

    Sometimes, you want to create a new variable where the value is based upon the value of another variable, use IF-THEN
    • General Examples:
      • creating “dummy” variables for nominal or ordinal level data
      • recoding or grouping categories of data
    • General Syntax:
      • if var  = x then newvar =  y ;
    • (There is no WHERE-THEN combination possible in SAS.)
    DATA cardealers;
     SET mastercardata;
    IF CARS=‘PONTIAC’ THEN BRAND=‘GM’;
    IF CARS=‘CIVIC’ THEN BRAND=‘HONDA’;
      RUN;
    These statements are unlinked – they are run in sequence.
     
    WORK.cardealers
    ID        CARS          LOCATION            Brand of Car
    001    PONTIAC   DETROIT               GM
    002    PONTIAC   WAYNE                 GM
    003    CIVIC          MARYSVILLE     HONDA
    004    CRX              MARYSVILLE 
    005    ESCAPE      DEARBORN 
    006    PONTIAC   DETROIT               GM
    007    ACCORD     MARYSVILLE
    008    COROLLA  FREEMONT
    009    626               FLATROCK
    010    JETTA          BAVARIA
     

    IF-THEN-ELSE

    DATA cardealers;
     SET mastercardata;
    IF CARS=‘PONTIAC’ THEN BRAND=‘GM’;
       ELSE BRAND=‘other’;
    RUN;
    • Result: In those observations where CARS=‘PONTIAC’ is not true, BRAND takes the value, ‘other’ in WORK.cardealers
    • IF-THEN-ELSE statements are linked. ELSE is conditional on the previous IF
    • ELSE requires conditions to be mutually exclusive.
    WORK.cardealers
    ID        CARS          LOCATION            Brand of Car
    001    PONTIAC   DETROIT               GM
    002    PONTIAC   WAYNE                 GM
    003    CIVIC          MARYSVILLE    other
    004    CRX              MARYSVILLE    other
    005    ESCAPE      DEARBORN          other
    006    PONTIAC   DETROIT               GM
    007    ACCORD     MARYSVILLE    other
    008    COROLLA  FREEMONT          other
    009    626               FLATROCK          other
    010    JETTA          BAVARIA            other
     

    IF-THEN-ELSE-ELSE...

    DATA cardealers;
     SET mastercardata;
     IF CARS=‘PONTIAC’ THEN BRAND=‘GM’;
      ELSE IF CARS=‘CIVIC’ THEN BRAND=‘Honda’;
      ELSE IF CARS=‘CRX’   THEN BRAND=‘Honda’;
     ELSE BRAND=‘other’;
      RUN;
    IF-THEN-ELSE statements are linked. ELSE is conditional on the previous IF.
    Much more efficient. If first condition is true, then SAS skips the linked ELSE statement.
     
    WORK.cardealers
    ID        CARS          LOCATION            Brand of Car
    001    PONTIAC   DETROIT               GM
    002    PONTIAC   WAYNE                 GM
    003    CIVIC          MARYSVILLE    HONDA
    004    CRX              MARYSVILLE    other
    005    ESCAPE      DEARBORN          other
    006    PONTIAC   DETROIT               GM
    007    ACCORD     MARYSVILLE    other
    008    COROLLA  FREEMONT          other
    009    626               FLATROCK          other
    010    JETTA          BAVARIA            other
     
    BMI
    0~21   1
    21~25  2
    25~31  3
    31~   4
    If BMI <= 21 then BMICAT=1;
         Else If (BMI<=25 and BMI >21) then BMICAT=2;
         Else If (BMI<=31 and BMI >25) then BMICAT=3;
    Else BMICAT=4;   (沒有missing data的情況)
         Else if  (BMI>31) then BMICAT=4;
    Else BMICAT=.; (有missing data的情況)
     

    IF – THEN used inefficiently

    Example: Create 4 age and gender combinations for teens and those older, by gender

    DATA newBRFSS;
     SET brfss.brfss03;
     IF AGE < 20 and SEX=‘F’    then CLASS=1;
     IF AGE >= 20 and SEX=‘F’   then CLASS=2;
      IF AGE < 20 and SEX=‘M’    then CLASS=3;
     IF AGE >= 20 and SEX=‘M’   then CLASS=4;
      IF AGE =. and SEX=‘M’      then CLASS=0;
      IF AGE =. and SEX=‘F’      then CLASS=0;
      IF SEX=‘ ‘                 then CLASS=0;
       RUN;
    SAS will run through each independent condition. This can be a labored, inefficient process. Use ELSE if possible.
     

    IF – THEN - ELSE

    1. IF-THEN-ELSE coding requires that each of the nested statements must be mutually exclusive.
    2. i.e. Conditions cannot overlap.
    Here, each combination of AGE and SEX creates a unique condition for CLASS.
    • This example shows one way of handling missing data.

    DATA newBRFSS;

     SET brfss.brfss03;
     IF AGE < 20 and SEX=‘F’       then CLASS=1;
     else IF AGE >= 20 and SEX=‘F’ then CLASS=2;
      else IF AGE < 20 and SEX=‘M’  then CLASS=3;
     else IF AGE >= 20 and SEX=‘M’ then CLASS=4;
      else IF AGE =. or SEX=‘ ‘     then CLASS=0;
       RUN;

    IF – THEN – ELSE (better)

    Use the ELSE statement to default all cases where missing age exists:

    DATA newBRFSS;

     SET brfss.brfss03;
     IF AGE =. or SEX=‘ ‘          then CLASS=0;
     else IF AGE < 20 and SEX=‘F’  then CLASS=1;
     else IF AGE >= 20 and SEX=‘F’ then CLASS=2;
      else IF AGE < 20 and SEX=‘M’  then CLASS=3;
     else IF AGE >= 20 and SEX=‘M’ then CLASS=4;
       RUN;

    Result in general: SAS will check each set of conditions until TRUE, then execute the THEN statement and skip the remaining ELSE IF statements.

    Specific result: ELSE removes two statements from the previous form. Also, it places all cases with missing data into the correct bin (CLASS=0). This is VERY EFFICIENT, since SAS will categorize the condition when true and stop processing the observation on the remaining set of nested statements.
     

    IF – THEN - ELSE

    1. Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN statements.
    2. Using IF-THEN statements with the ELSE statement causes SAS to execute IF-THEN statements until it encounters the first true statement. Subsequent IF-THEN statements are not evaluated  

    Example: IF-THEN-ELSE IF-THEN

    data regional_analysis;
              set shipment;
              if state in ('CA' 'OR' 'WA' 'HI' 'AK' 'ID' 'MT' 'AZ' 'NM' ‘CO’ ‘WY’ ‘UT’ ‘TX’ ‘NV’) then Do;
                  region='West';
                  Regnum=1;
                  END;
              else if state in ('VA' 'FL' 'GA' 'NC' 'SC' 'AL' 'MS' 'LA' 'AR'
                      'WV' 'KT' 'TN') then
                      region='South';
              else if state in ('MD' 'DC' 'DE' 'PA' 'NJ' 'NY') then
                      region='Northeast';
              else if state in ('CT' 'RI' 'MA' 'NH' 'ME' 'VT') then
                      region='New England';
              else if state in ('ND' 'SD' 'MO' 'IA' 'MN' 'WI' 'IL' 'IN' 'OH'
                      'NE' 'KS' 'MI' 'OK') then
                      region='Midwest';
             run;

    Then Do;

    可做多件事;
    可做多件事;
    可做多件事;
    END;

    IF … THEN…只能做一件事

     

    Subsetting IF statement (in DATA step)

    Example: You have a dataset with households (HH) as observations, and the number of adults, men, women and children per household. Create a dataset of HHs with

    at least 2 adults-                     if numadult >=2;

    no men-                                  if nummen = 0;
    1 man and 1 woman-             if nummen = 1 and numwomen = 1;
    same # of men and women-   if nummen = numwomen;
    total HH size is less than 5-    if numadult + children < 5;

    Example data set with family units

    HHCODE    NUMADULT     NUMWOMEN         NUMMEN

    4244                        2                            1                             1
    4245                        1                            0                             1
    3547                        3                            2                             1
    3548                        1                            0                             1
    3549                        4                            2                             2
    We need a new variable that describes household type (HHtype).              
     
    Combine conditional subsetting (IF-THEN) with conjunctions and intersections
    Example: make new HH variables where
    1= Single adult, 2=couple, 3= multiple adults
     
    LIBNAME brfss ‘~/EPBI414/classes’;
    DATA newBRFSS;
     SET brfss.brfss02;
     IF numadult = 1 or (nummen = 1 and numwomen = 0) or  (nummen = 0 and numwomen = 1)   THEN HHtype = 1;
          ELSE IF nummen = 1 and numwomen = 1 THEN HHtype = 2;
          ELSE IF nummen + numwomen > 2       THEN HHtype = 3;
      RUN;
     
    HHCODE    NUMADULT     NUMWOMEN         NUMMEN     HHTYPE
    4244                        2                            1                             1                        2
    4245                        1                            0                             1                        1
    3547                        3                            2                             1                       3
    3548                        1                            0                             1                        1
    3549                        4                            2                             2                       3
     

    Questions

    Q: How many groups (levels or classifications) of new variables would you need to classify combinations of sex (M/F) and US citizen (Yes/No)?

    A: 4 groups: M/Yes, M/No, F/Yes, F/No

    example:
    IF SEX=‘M’ and USCit=‘Yes’      then S_Cit=1;
    ELSE IF SEX=‘M’ and USCit=‘No’  then S_Cit=2;
    ELSE IF SEX=‘F’ and USCit=‘Yes’ then S_Cit=3;
    ELSE IF SEX=‘F’ and USCit=‘No’  then S_Cit=4;
    ELSE S_Cit= . ;

    Q: How many groups (classifications) of new variables would you need to classify combinations of sex (M/F) and US citizen (Yes/No/Don’t Know)?

    A: 6 groups: M/Yes, M/No, M/DK, F/Yes, F/No, F/DK

    Extra topic: “Dummy” variables 

    End of conditional processing

    Remember, IF-THEN-ELSE coding requires that each of the conditions in the nested statements must be mutually exclusive.
    i.e. Conditions (values) cannot overlap.

     

    Comments