Return to Archive

SAS Tip of the Month
July 2010

Occasionally we need to split a dataset into parts -- one reason may be that the dataset is to large to send as one file but can be sent in parts. One of the tools I have to do this is a macro that counts the number of records in a dataset, then splits that dataset into a user defined number of records:

	%macro breakDataset(dataset=_last_  /*Input dataset*/
	                   ,numindset=100   /*Maximum number of obs in output datasets*/
	                   );
	   
	   *Count the number of records and calculate the number of datasets needed;
	   data _null_;
	      if 0 then set &dataset nobs=nobs;
	      call symput('nobs',strip(put(nobs,8.)));
	      call symput('numoutdset',strip(put(ceil(nobs/&numindset),8.)));
	      stop;
	   run;

	   *Split dataset -- output dataset name is OriginalDatasetName_number;
	   %do i=1 %to &numoutdset;
	      data &dataset._&i;
	         set &dataset (firstobs=%eval((&numindset.*(&i-1))+1) obs=%eval(&numindset.*&i));
	      run;
	   %end;   
	   
	%mend breakDataset;

The following example creates a dataset X with 2500 records, and when the macro is run three datasets are created (X_1, X_2 and X_3) with 1000 observations for the first two and 500 observations in the last dataset.

   data x;
      do k=1 to 2500;
         output;
      end;
   run;
   %breakDataset(dataset=x
                ,numindset=1000
                );
   run;

If you are looking for another solution, you may be interested in a paper called Splitting a Large SASŪ Data Set written by Selvaratnam Sridharma from the Census Bureau in Washington, D.C., presented at SUGI 28 back in 2003.

See you in August.

________________________________
Updated July 5, 2010