of the Month
I recently had a question from Bob in New Jersey asking me to look at whether NODUPKEY always selected the first record in a BY group -- his opinion was that it always did.
The short answer is 'NOT REALLY'. However the answer does depend on two other options -- the EQUALS|NOEQUALS option in the SORT procedure, and the global option SORTEQUAL|NOSORTEQUALS (default when shipped from SAS is SORTEQUAL).
Lets look at first what the global option SORTEQUAL|NOSORTEQUALS is all about, quoting from the SAS documentation:
It is interesting to note that SAS suggests that in order to save resources, use NOSORTEQUALS when you do not need to maintain a specific order of observations with identical BY variable values.
Now lets look at the EQUALS|NOEQUALS option in the SORT procedure:
From the SAS documentation, the NOEQUALS option is useful if you are sorting a large dataset as it can save in CPU but has the disadvantage of producing inconsistent results in your output data sets if you run the code multiple times.
Now that we have looked at all the documentation, lets take a look at some data:
data ae0; input ptnum $6. aeseq event $20.; cards; 001002 1 HEADACHE 001001 1 FEVER 001001 2 HEADACHE 001003 1 NAUSEA 001003 4 DIARRHEOA 001003 2 VOMITING 001004 1 DIARRHEOA 001001 3 DIARRHEOA 001002 2 DIARRHEOA 001004 2 HEADACHE 001003 3 FEVER ; run;
In addition, to make sure the global option SORTEQUAL|NOSORTEQUALS is set to its usual shipped default the option shall be set to SORTEQUAL:
Now lets look at the following code where we just use NODUPKEY (I have put the EQUALS option for clarity although this is not needed with the setting of the global option SORTEQUAL):
proc sort data=ae0 out=ae1 nodupkey equals; by ptnum; title1 "Data with NODUPKEY and EQUALS"; proc print data=ae1 noobs; run; === Output === Data with NODUPKEY and EQUALS ptnum aeseq event 001001 1 FEVER 001002 1 HEADACHE 001003 1 NAUSEA 001004 1 DIARRHEOA
In the code above the first record of each "by group" value is output, i.e. the first record in the sequence. Now lets just suppose that the option EQUALS|NOEQUALS was set to NOEQUALS (or the setting of the global option NOSORTEQUALS):
proc sort data=ae0 out=ae1 nodupkey noequals; by ptnum; title1 "Data with NODUPKEY and NOEQUALS"; proc print data=ae1 noobs; run; === Output === Data with NODUPKEY and NOEQUALS ptnum aeseq event 001001 3 DIARRHEOA 001002 1 HEADACHE 001003 3 FEVER 001004 1 DIARRHEOA
Note that the output shows that it is not necessarily the first record for the PTNUM variable.
Inside SAS there are a large number of SAS options that can affect the output that is produced, whether is be from a procedure or datastep. If your program produces unexpected results, after you have gone though looking at the SAS LOG and exhausted other remedies, take a look at the SAS options that are applied.
I know this is a very technical tip this month but I hope that it shows that SAS is a very sophisticated piece of software and must be treated with respect. I hope this topic is interesting for some. See you in March.
Updated February 02, 2010