of the Month
Occasionally we need to split a dataset into parts -- one reason may be that the dataset is to large to send as one file but can be sent in parts. One of the tools I have to do this is a macro that counts the number of records in a dataset, then splits that dataset into a user defined number of records:
%macro breakDataset(dataset=_last_ /*Input dataset*/ ,numindset=100 /*Maximum number of obs in output datasets*/ ); *Count the number of records and calculate the number of datasets needed; data _null_; if 0 then set &dataset nobs=nobs; call symput('nobs',strip(put(nobs,8.))); call symput('numoutdset',strip(put(ceil(nobs/&numindset),8.))); stop; run; *Split dataset -- output dataset name is OriginalDatasetName_number; %do i=1 %to &numoutdset; data &dataset._&i; set &dataset (firstobs=%eval((&numindset.*(&i-1))+1) obs=%eval(&numindset.*&i)); run; %end; %mend breakDataset;
The following example creates a dataset X with 2500 records, and when the macro is run three datasets are created (X_1, X_2 and X_3) with 1000 observations for the first two and 500 observations in the last dataset.
data x; do k=1 to 2500; output; end; run; %breakDataset(dataset=x ,numindset=1000 ); run;
If you are looking for another solution, you may be interested in a paper called Splitting a Large SASŪ Data Set written by Selvaratnam Sridharma from the Census Bureau in Washington, D.C., presented at SUGI 28 back in 2003.
See you in August.
Updated July 5, 2010