Goto the Tip of the Month Archive Other interesting pages ... |
Sometimes we want a SAS statistical procedure to capture the results as a SAS dataset. Often we can use an OUT option, or similar, that many statistical procedures use but occasionally we need more statistics produced that are not available using this format. Welcome the ODS OUTPUT feature in SAS version 9.1 and WPS version 2.4 and beyond. Before I go too far, lets load some data where we are going to capture the Quantiles using the PROC UNVARIATE procedure: data River_US (label='Rivers in US over 1000 miles in Length'); attrib name length=$40 label='River Name' distance length=8 label='Distance (miles)'; input Name $ & distance @@; cards; Mississippi 2340 Columbia 1240 Colorado 1450 St. Lawrence 1900 Missouri 2540 Ohio 1310 Rio Grande 1900 Brazos 1280 Snake 1040 Atchafalaya 1420 Yukon 1980 Red 1290 Arkansas 1460 ; run; Almost all the statistical procedures produce internal outputs that can be captured by ODS, it is just a case of finding out which internal output to capture. To find these internal outputs use the "ODS TRACE sandwich" as seen below: ODS TRACE ON; proc univariate data=River_US; var distance; run; ODS TRACE OFF; Running the code above we get the following information in the SAS LOG: Output Added: ------------- Name: Moments Label: Moments Template: base.univariate.Moments Path: Univariate.distance.Moments ------------- Output Added: ------------- Name: BasicMeasures Label: Basic Measures of Location and Variability Template: base.univariate.Measures Path: Univariate.distance.BasicMeasures ------------- Output Added: ------------- Name: TestsForLocation Label: Tests For Location Template: base.univariate.Location Path: Univariate.distance.TestsForLocation ------------- Output Added: ------------- Name: Quantiles Label: Quantiles Template: base.univariate.Quantiles Path: Univariate.distance.Quantiles ------------- Output Added: ------------- Name: ExtremeObs Label: Extreme Observations Template: base.univariate.ExtObs Path: Univariate.distance.ExtremeObs ------------- Output Added: ------------- Name: MissingValues Label: Missing Values Template: base.univariate.Missings Path: Univariate.distance.MissingValues ------------- with the following output: The UNIVARIATE Procedure Variable: distance (Distance (miles)) Moments N 8 Sum Weights 8 Mean 1745 Sum Observations 13960 Std Deviation 504.494089 Variance 254514.286 Skewness 0.56850936 Kurtosis -1.2949709 Uncorrected SS 26141800 Corrected SS 1781600 Coeff Variation 28.910836 Std Error Mean 178.365596 Basic Statistical Measures Location Variability Mean 1745.000 Std Deviation 504.49409 Median 1675.000 Variance 254514 Mode 1900.000 Range 1300 Interquartile Range 825.00000 Tests for Location: Mu0=0 Test -Statistic- -----p Value------ Student's t t 9.783277 Pr > |t| <.0001 Sign M 4 Pr >= |M| 0.0078 Signed Rank S 18 Pr >= |S| 0.0078 Quantiles (Definition 5) Quantile Estimate 100% Max 2540 99% 2540 95% 2540 90% 2540 75% Q3 2120 50% Median 1675 25% Q1 1295 10% 1240 5% 1240 1% 1240 0% Min 1240 The UNIVARIATE Procedure Variable: distance (Distance (miles)) Extreme Observations ----Lowest---- ----Highest--- Value Obs Value Obs 1240 2 1450 3 1280 8 1900 4 1310 6 1900 7 1450 3 2340 1 1900 7 2540 5 Missing Values -----Percent Of----- Missing Missing Value Count All Obs Obs . 4 33.33 100.00 If we look at the output in the OUTPUT window you will notice that the section labels refer to the internal output sections in the LOG file. For this example I want to collect the first and third quartile which match to the labels '25% Q1' and '75% Q3'. The ODS TRACE "sandwich" code above is replaced with an ODS OUTPUT "sandwich" with the section name, in our case "Quantiles" and the output dataset name we want to use -- see below for our example: ODS OUTPUT Quantiles=_Q0; proc univariate data=River_US; var distance; run; ODS OUTPUT CLOSE; Using a PROC PRINT to look at the output from dataset _Q0 we get the following: proc print data=_Q0 width=minimum; run; Obs VarName Quantile Estimate 1 distance 100% Max 2540 2 distance 99% 2540 3 distance 95% 2540 4 distance 90% 2540 5 distance 75% Q3 2120 6 distance 50% Median 1675 7 distance 25% Q1 1295 8 distance 10% 1240 9 distance 5% 1240 10 distance 1% 1240 11 distance 0% Min 1240 The actual dataset, _Q0, contains more than the first and third quartile but this information is now ready to be used elsewhere. I hope this is useful. See you in November. |

________________________________ Updated October 3, 2011 |