Return to Archive

SAS Tip of the Month
February 2010

I recently had a question from Bob in New Jersey asking me to look at whether NODUPKEY always selected the first record in a BY group -- his opinion was that it always did.

The short answer is 'NOT REALLY'. However the answer does depend on two other options -- the EQUALS|NOEQUALS option in the SORT procedure, and the global option SORTEQUAL|NOSORTEQUALS (default when shipped from SAS is SORTEQUAL).

Lets look at first what the global option SORTEQUAL|NOSORTEQUALS is all about, quoting from the SAS documentation:

The global option SORTEQUAL|NOSORTEQUALS dictates if the EQUALS or NOEQUALS option is set by default in the SORT procedure, where:

SORTEQUALS
specifies that observations with identical BY variable values are to retain the same relative positions in the output data set as in the input data set.

NOSORTEQUALS
specifies that no resources be used to control the order of observations with identical BY variable values in the output data set.

It is interesting to note that SAS suggests that in order to save resources, use NOSORTEQUALS when you do not need to maintain a specific order of observations with identical BY variable values.

Now lets look at the EQUALS|NOEQUALS option in the SORT procedure:

Specifies the order of the observations in the output data set. For observations with identical BY-variable values, EQUALS maintains the relative order of the observations within the input data set in the output data set. This is in contrast to the NOEQUALS that does not necessarily preserve this order in the output data set.

From the SAS documentation, the NOEQUALS option is useful if you are sorting a large dataset as it can save in CPU but has the disadvantage of producing inconsistent results in your output data sets if you run the code multiple times.

Now that we have looked at all the documentation, lets take a look at some data:

   data ae0;
      input ptnum $6. aeseq event $20.;
   cards;
   001002 1 HEADACHE
   001001 1 FEVER
   001001 2 HEADACHE
   001003 1 NAUSEA
   001003 4 DIARRHEOA
   001003 2 VOMITING
   001004 1 DIARRHEOA
   001001 3 DIARRHEOA
   001002 2 DIARRHEOA
   001004 2 HEADACHE
   001003 3 FEVER
   ;
   run;

In addition, to make sure the global option SORTEQUAL|NOSORTEQUALS is set to its usual shipped default the option shall be set to SORTEQUAL:

   options SORTEQUAL;

Now lets look at the following code where we just use NODUPKEY (I have put the EQUALS option for clarity although this is not needed with the setting of the global option SORTEQUAL):

   proc sort data=ae0 out=ae1 nodupkey equals;
      by ptnum;
   title1 "Data with NODUPKEY and EQUALS";
   proc print data=ae1 noobs;
   run;

   === Output ===
   Data with NODUPKEY and EQUALS

   ptnum     aeseq    event

   001001      1      FEVER
   001002      1      HEADACHE
   001003      1      NAUSEA
   001004      1      DIARRHEOA

In the code above the first record of each "by group" value is output, i.e. the first record in the sequence. Now lets just suppose that the option EQUALS|NOEQUALS was set to NOEQUALS (or the setting of the global option NOSORTEQUALS):

   proc sort data=ae0 out=ae1 nodupkey noequals;
      by ptnum;
   title1 "Data with NODUPKEY and NOEQUALS";
   proc print data=ae1 noobs;
   run;

   === Output ===
   Data with NODUPKEY and NOEQUALS

   ptnum     aeseq    event

   001001      3      DIARRHEOA
   001002      1      HEADACHE
   001003      3      FEVER
   001004      1      DIARRHEOA

Note that the output shows that it is not necessarily the first record for the PTNUM variable.

Inside SAS there are a large number of SAS options that can affect the output that is produced, whether is be from a procedure or datastep. If your program produces unexpected results, after you have gone though looking at the SAS LOG and exhausted other remedies, take a look at the SAS options that are applied.

I know this is a very technical tip this month but I hope that it shows that SAS is a very sophisticated piece of software and must be treated with respect. I hope this topic is interesting for some. See you in March.

________________________________
Updated February 02, 2010