Return to Archive

SAS Tip of the Month
November 2004

Here is something from my vault that would like to share with you. It is an example where SAS can produce unexpected answers and has been known to fool even the most experienced SAS programmer.

First, lets look at our three datasets, each indicating the number and color of all the jellybeans they found in a single packet of sweets:

    DATASET:TOM
    Color  Number  Multiple
    -----------------------
    RED      1        N
    YELLOW   3        Y
    GREEN    2        Y
    BLUE     0        N

    DATASET:DICK
    Color  Number
    -------------
    RED      4
    YELLOW   2
    GREEN    3
    BLUE     1

    DATASET:HARRY
    Color  Number  Multiple
    -----------------------
    RED      2        Y
    YELLOW   4        Y
    GREEN    3        Y
    BLUE     1        N

The variable MULTIPLE in the datasets TOM and HARRY indicate that there were multiple counts for that color, thus when the data is brought together the variable will have to be set in DICK via the program. Below is a first attempt at bringing the data together:

    data all0;
        length name $5;
        set tom (in=a) dick (in=b) harry (in=c);
        select;
            when(a) name='TOM';
            when(b) do;
                name='DICK';
                if number>1 then multiple='Y';
                if missing(multiple) then multiple='N';
              end;
            when(c) name='HARRY';
        end;
    run;

When the program logic is reviewed, the datastep should bring in the data from datasets TOM, DICK and HARRY. If the data from dataset DICK is being processed then the MULTIPLE variable should be set to Y if NUMBER>1, and if the MULTIPLE is still missing then MULTIPLE should be set to N. Lets look at what the dataset ALL0 actually contains:

    name     color     number    multiple
    -------------------------------------
    TOM      RED          1         N
    TOM      YELLOW       3         Y
    TOM      GREEN        2         Y
    TOM      BLUE         0         N
    DICK     RED          4         Y
    DICK     YELLOW       2         Y
    DICK     GREEN        3         Y
    DICK     BLUE         1         Y
    HARRY    RED          2         Y
    HARRY    YELLOW       4         Y
    HARRY    GREEN        3         Y
    HARRY    BLUE         1         N

Look closely at value in the variable MULTIPLE for the BLUE count from DICK, it is 'Y' when the logic says it should be 'N'! This is due to a little known feature in SAS where if a variable is being set that is not in the original dataset (but is present in other datasets called in the SET statement) the variable will be treated as if a RETAIN statement was in force!

To get around this be sure that you always explicitly define what should be set to a variable for all possibilities. In the case above, all the program would need is the line

    if missing(multiple) then multiple='N';

changed to read

    else multiple='N';

and the datastep would produce what was expected.

________________________________
Updated November 13, 2004