SAS Tip of the Month
I was reviewing some SAS code recently and I spotted a mistake that every programmer makes at sometime -- when merging two datasets in a datastep, the BY statement is missing, as the following example demonstrates:
data mergeddata; merge dataset1 dataset2; run;
If we look at the SAS LOG the following (or similar) will appear:
250 data mergeddata; 251 merge dataset1 dataset2; 252 run; NOTE: There were 10 observations read from the data set WORK.DATASET1. NOTE: There were 9 observations read from the data set WORK.DATASET2. NOTE: The data set WORK.MERGEDDATA has 10 observations and 2 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
There is no indication that there is a potential problem, infact there could be a very serious problem. If a BY statement is missing, the two datasets will be matched by observation number, so the first observation in dataset1 is matched with the first observation in dataset2, and so on. It can be shown best with the following data:
Dataset1 Dataset2 mergeddata Obs name sex name age name sex age 1 Alexandra F Alexandra 15 Alexandra F 15 2 Bailey F Bailey 16 Bailey F 16 3 Blake M Caroline 16 Caroline M 16 4 Caroline F Dominic 13 Dominic F 13 5 Dominic M Jose 17 Jose M 17 6 Jose M Justin 18 Justin M 18 7 Justin M Melanie 15 Melanie M 15 8 Melanie F Oliver 15 Oliver F 15 9 Oliver M Sydney 14 Sydney M 14 10 Sydney F Sydney F .
As can be seen the resulting dataset is worthless at best.
There is an option called MERGENOBY that has been around for some time that can issue a warning or error message to the SAS LOG if in a datastep that is merging data with a MERGE statement has no BY statement, and has the following options:
As a matter of course I have the option MERGENOBY=WARN in my programming so that if I forget a BY statement then a WARNING message will appear, you should to.
See you in December!
Updated November 2, 2013