Return to Homepage

Goto the Tip of the Month Archive

Other interesting pages ...
LinkedIn Profile
SAS Cheat Sheet
Useful SAS Code
Full SAS Example
Basic Statistics
Contact Information

SAS Tip of the Month
October 2011
(for SAS and WPS)

Sometimes we want a SAS statistical procedure to capture the results as a SAS dataset. Often we can use an OUT option, or similar, that many statistical procedures use but occasionally we need more statistics produced that are not available using this format. Welcome the ODS OUTPUT feature in SAS version 9.1 and WPS version 2.4 and beyond.

Before I go too far, lets load some data where we are going to capture the Quantiles using the PROC UNVARIATE procedure:

   data River_US (label='Rivers in US over 1000 miles in Length');
      attrib name length=$40 label='River Name'
             distance length=8 label='Distance (miles)';
      input Name $ & distance @@;
   cards;
   Mississippi  2340
   Columbia  1240
   Colorado  1450
   St. Lawrence  1900
   Missouri  2540
   Ohio  1310
   Rio Grande  1900
   Brazos  1280
   Snake 1040
   Atchafalaya  1420
   Yukon  1980
   Red  1290
   Arkansas  1460
   ;
   run;

Almost all the statistical procedures produce internal outputs that can be captured by ODS, it is just a case of finding out which internal output to capture.

To find these internal outputs use the "ODS TRACE sandwich" as seen below:

   ODS TRACE ON;
   proc univariate data=River_US;
      var distance;
   run;
   ODS TRACE OFF;

Running the code above we get the following information in the SAS LOG:

   Output Added:
   -------------
   Name:       Moments
   Label:      Moments
   Template:   base.univariate.Moments
   Path:       Univariate.distance.Moments
   -------------

   Output Added:
   -------------
   Name:       BasicMeasures
   Label:      Basic Measures of Location and Variability
   Template:   base.univariate.Measures
   Path:       Univariate.distance.BasicMeasures
   -------------

   Output Added:
   -------------
   Name:       TestsForLocation
   Label:      Tests For Location
   Template:   base.univariate.Location
   Path:       Univariate.distance.TestsForLocation
   -------------

   Output Added:
   -------------
   Name:       Quantiles
   Label:      Quantiles
   Template:   base.univariate.Quantiles
   Path:       Univariate.distance.Quantiles
   -------------

   Output Added:
   -------------
   Name:       ExtremeObs
   Label:      Extreme Observations
   Template:   base.univariate.ExtObs
   Path:       Univariate.distance.ExtremeObs
   -------------

   Output Added:
   -------------
   Name:       MissingValues
   Label:      Missing Values
   Template:   base.univariate.Missings
   Path:       Univariate.distance.MissingValues
   -------------

with the following output:

                       The UNIVARIATE Procedure
               Variable:  distance  (Distance (miles))

                               Moments

   N                           8    Sum Weights                  8
   Mean                     1745    Sum Observations         13960
   Std Deviation      504.494089    Variance            254514.286
   Skewness           0.56850936    Kurtosis            -1.2949709
   Uncorrected SS       26141800    Corrected SS           1781600
   Coeff Variation     28.910836    Std Error Mean      178.365596


                      Basic Statistical Measures

            Location                    Variability

        Mean     1745.000     Std Deviation          504.49409
        Median   1675.000     Variance                  254514
        Mode     1900.000     Range                       1300
                              Interquartile Range    825.00000


                      Tests for Location: Mu0=0

           Test           -Statistic-    -----p Value------

           Student's t    t  9.783277    Pr > |t|    <.0001
           Sign           M         4    Pr >= |M|   0.0078
           Signed Rank    S        18    Pr >= |S|   0.0078


                       Quantiles (Definition 5)

                        Quantile      Estimate

                        100% Max          2540
                        99%               2540
                        95%               2540
                        90%               2540
                        75% Q3            2120
                        50% Median        1675
                        25% Q1            1295
                        10%               1240
                        5%                1240
                        1%                1240
                        0% Min            1240

                       The UNIVARIATE Procedure
               Variable:  distance  (Distance (miles))

                         Extreme Observations

                 ----Lowest----        ----Highest---

                 Value      Obs        Value      Obs

                  1240        2         1450        3
                  1280        8         1900        4
                  1310        6         1900        7
                  1450        3         2340        1
                  1900        7         2540        5


                            Missing Values

                                    -----Percent Of-----
             Missing                             Missing
               Value       Count     All Obs         Obs

                   .           4       33.33      100.00

If we look at the output in the OUTPUT window you will notice that the section labels refer to the internal output sections in the LOG file. For this example I want to collect the first and third quartile which match to the labels '25% Q1' and '75% Q3'. The ODS TRACE "sandwich" code above is replaced with an ODS OUTPUT "sandwich" with the section name, in our case "Quantiles" and the output dataset name we want to use -- see below for our example:

   ODS OUTPUT Quantiles=_Q0;
   proc univariate data=River_US;
      var distance;
   run;
   ODS OUTPUT CLOSE;

Using a PROC PRINT to look at the output from dataset _Q0 we get the following:

   proc print data=_Q0 width=minimum;
   run;


   Obs    VarName     Quantile      Estimate

     1    distance    100% Max        2540
     2    distance    99%             2540
     3    distance    95%             2540
     4    distance    90%             2540
     5    distance    75% Q3          2120
     6    distance    50% Median      1675
     7    distance    25% Q1          1295
     8    distance    10%             1240
     9    distance    5%              1240
    10    distance    1%              1240
    11    distance    0% Min          1240

The actual dataset, _Q0, contains more than the first and third quartile but this information is now ready to be used elsewhere.

I hope this is useful. See you in November.

________________________________
Updated October 3, 2011