Return to Homepage

Goto the Tip of the Month Archive

Other interesting pages ...
LinkedIn Profile
SAS Cheat Sheet
Useful SAS Code
Full SAS Example
Basic Statistics
Contact Information

SAS Tip of the Month
August 2012
(for SAS and WPS)

Ever had to remove dummy records or subset data in an unknown number of datasets in a directory? This tip may well be useful for you.

You could start with the following code:

   data ;
      set ;
      where ;
   data ;
      set ;
      where ;
   ... etc.

but this is to time-consuming, especially if you have more than just a few datasets. Also it is up to the programmer to determine if the dataset actually has the variable needed for the condition to be true.

Lets look at an example where a directory has around 100 datasets but we are only interested in working with datasets AE, COUNTRY, DM, EX, LB and VS. Note also that we cannot modify or delete the original datasets but we can work with them elsewhere, whether we copy the datasets from one directory to another, or work in the WORK library. For the purposes of this example we just want to bring these datasets to the WORK library and remove the dummy records, signified by the value '9999' in the variable SUBJID. Note also that the variable SUBJID is not in the dataset COUNTRY.

The solution I am presenting here is just one method to solve this problem -- there are more "efficient" methods but this gives a good look at what is going on.

First, lets copy the necessary datasets from a directory which I will call ORIGDATA:

   proc delete data=work._all_;
     *Remove any existing datasets in the WORK directory;
   proc copy in=ORIGDATA out=work;
     *Copy datasets of interest to the WORK directory;
      select AE COUNTRY DM EX LB VS;

Now find out which datasets have the variable name SUBJID and put those datasets that have the variable in a set of macro variables:

   proc contents data=work._all_ out=_cont0 (keep=memname name) noprint;
     *Get contents of the datasets in work area keeping only the dataset name
     (MEMNAME) and variable name (NAME);
   %let dslist0=0;
     *Set internal counter with number of datasets with variable SUBJID to zero;
   data _null_;
      set _cont0 end=eof; *Use dataset with contents of datasets created above;
      where name='SUBJID';
        *Only interested in those datasets which have a variable called SUBJID;
      call symput('dslist'||compress(put(_n_,best.)),compress(memname));
        *Create a macro variable DSLISTn, where n is the occurence of the variable
         SUBJID, containing the name of the dataset with the variable SUBJID;
      if eof then call symput('dslist0',compress(put(_n_,best.)));
        *If last record in dataset put count in macro variable DSLIST0;

The dataset COUNTRY would not have been picked up for processing in the selection above as it did not contain the variable SUBJID, as in our original example. Now that we have our list of datasets with the variable SUBJID in macro variables it is now time to do the sub-setting using macro code:

   %do i=1 %to &dslist0;
          *Fails if no datasets found with variable SUBJID, hence setting value
           to zero initially;
      data &&dslist&i;
          *Macro resolution, first resolves the &i then resolves the &&dslist;
         set &&dslist&i;
         where subjid^='9999'; *Keeping data that SUBJID not equal to '9999';

With this code it does not matter if there is one dataset or 1000 datasets in a library, the code will work for all those that meet a condition as spelt out in the original question.

Hope this is useful. Have a safe and happy August.

See you next month.

Updated August 8, 2012