of the Month
How do you add an observation with no values for all variables on to a dataset with no observations?
Lets look at a dataset that is frequently referenced elsewhere called CLASS with the following structure:
data class; attrib name length=$8 sex length=$1 age length=8 height length=8 weight length=8; delete; run;
The dataset now has five variables and zero records.
Now lets add a blank record. It should be possible to run run the following code which opens the dataset CLASS, sets a value of MISSING for the variable NAME, and then output that record:
data class; set class; call missing(name); output; run;
However, running this code DOES NOT create the blank record.
For many, the resolution to this problem is to create a temporary dataset with one of the variables set to missing and then append the temporary dataset to the original dataset, as shown below:
data temporary; length name $8; call missing(name); output; data class; set class temporary; run;
This certainly works!
Instead of using the SET statement, the MODIFY statement works as the following code demonstrates:
data class; if 0 then modify class; call missing(name); output; run;
This works but it still requires us to write a variable but it needs knowledge of the dataset. Can we rewrite this step so we do not have to know the structure of the dataset? If we change the CALL MISSING statement to
call missing(of _all_);
so the whole datastep code is
data class; if 0 then modify class; call missing(of _all_); output; run;
Again, this works but uses the MODIFY statement -- is it still possible to use the SET statement to create a blank record?
From above, we know putting the SET statement after the DATA statement does not work, but what if we tried
data class; name=compress(" "); output; set class; run;
this works, but again we have to know the structure of the dataset -- one other issue is that the variable NAME is set to a length of 1, not 8 as originally set (the reason for why the variable is set to 1 is that the first statement after the DATA statement sets the amount of characters to 1 by the length of text before the COMPRESS statement). What if we were to remove the second line, ie. have the code as follows:
data class; output; set class; run;
On the first pass of the datastep the OUTPUT statement is not actioned but the output dataset is constructed from the SET statement, and it is on the second pass of the datastep that the OUTPUT statement is actioned resulting in a blank record, and with the SET statement not getting any records (there are no records in the dataset CLASS being read in) that the datastep stops -- result a dataset with no records now having a single record with all the variable values as missing!
Hope this is useful. See you in July.
Updated June 19, 2013