Return to Homepage

Goto the Tip of the Month Archive

Other interesting pages ...
LinkedIn Profile
SAS Cheat Sheet
Useful SAS Code
Full SAS Example
Basic Statistics
Contact Information

SAS Tip of the Month
December 2010
(for SAS and WPS)

Continuing our series on the PRINT procedure, this month I am looking at the use of the SUM statement, the effect of using the BYLINE option, and creating a basic HTML file. The data that I will be using is automobile numbers by state where the request is for an output of the numbers, but we also want to sum them by a defined region. First I am going to set up a some formats using the FORMAT procedure:

   *----------------------------------------------------------*;
   * Define formats;
   *----------------------------------------------------------*;

   proc format;

     ** Define State abreviation decodes;
     value $statew
       'AL'='Alabama' 'AK'='Alaska' 'AZ'='Arizona' 'AR'='Arkansas'
       'CA'='California' 'CO'='Colorado' 'CT'='Connecticut' 'DE'='Delaware'
       'DC'='District of Columbia' 'FL'='Florida' 'GA'='Georgia' 'HI'='Hawaii'
       'ID'='Idaho' 'IL'='Illinois' 'IN'='Indiana' 'IA'='Iowa' 'KS'='Kansas'
       'KY'='Kentucky' 'LA'='Louisiana' 'ME'='Maine' 'MD'='Maryland'
       'MA'='Massachusetts' 'MI'='Michigan' 'MN'='Minnesota' 'MS'='Mississippi'
       'MO'='Missouri' 'MT'='Montana' 'NE'='Nebraska' 'NV'='Nevada'
       'NH'='New Hampshire' 'NJ'='New Jersey' 'NM'='New Mexico' 'NY'='New York'
       'NC'='North Carolina' 'ND'='North Dakota' 'OH'='Ohio' 'OK'='Oklahoma'
       'OR'='Oregon' 'PA'='Pennsylvania' 'RI'='Rhode Island' 'SC'='South Carolina'
       'SD'='South Dakota' 'TN'='Tennessee' 'TX'='Texas' 'UT'='Utah' 'VT'='Vermont'
       'VA'='Virginia' 'WA'='Washington' 'WV'='West Virginia' 'WI'='Wisconsin'
       'WY'='Wyoming';

     ** Define Regions for Report;
     value $region
       'AK','WA','OR','ID'='Pacific Alaska Region'
       'CA','NV','AZ'='Pacific Region'
       'MT','ND','SD','WY','UT','CO','NM'='Rocky Mountain Region'
       'NE','IA','KS','MO'='Central Plains Region'
       'TX','OK','AR','LA'='Southwest Region'
       'MN','WI','IL','IN','MI','OH'='Great Lakes Region'
       'KY','TN','MS','AL','GA','FL','SC','NC'='South East Region'
       'PA','WV','VA','MD','DC','DE','NJ'='Mid Atlantic Region'
       'NY','VT','NH','CT','RI','MA','ME'='Northeast Region'
       'HI'='Hawaii';
   run;

I have defined a format to decode the state abreviation to the state name, although it is possible to use formats that SAS supplies. Now for some data, shown as a complete data step:

   *----------------------------------------------------------*;
   * Lets bring in our data, Automobile registrations for 2008;
   *----------------------------------------------------------*;

   data cars0;

      ** Lets define the three variables we are going to use;
      attrib state  length=$2 format=$statew. label='State'
             cars   length=8 informat=comma8. format=comma8.
                    label="Number of Automobiles (000's)"
             region length=$25 label='Region';
      infile cards;
      ** First two variables come directly from the data in the CARDS section;
      input state $ cars @@;
      ** Now define REGION from the STATE variable using the format $STATEW
         given above;
      region=put(state,$region.);
   cards;
   AL 2,203 AK 240 AZ 2,217 AR 948 CA 19,706 CO 724 CT 2,007 DE 454 DC 167
   FL 8,064 GA 4,229 HI 480 ID 529 IL 5,704 IN 3,110 IA 1,786 KS 876 KY 1,946
   LA 1,900 ME 555 MD 2,636 MA 3,215 MI 4,337 MN 2,530 MS 1,142 MO 2,613 MT 369
   NE 802 NV 689 NH 646 NJ 3,784 NM 676 NY 8,494 NC 3,539 ND 342 OH 6,345 OK 1,641
   OR 1,433 PA 6,035 RI 486 SC 1,985 SD 343 TN 2,809 TX 8,711 UT 1,180 VT 304
   VA 3,847 WA 3,277 WV 687 WI 2,643 WY 255
   ;
   run;

Our data came in the form of a stream, i.e. not one row per record -- using the '@@' symbol at the end of the INPUT statement tells the input pointer to remain on the current line until all the data for that row is read.

Before we can produce any output we need to set some things up. Along with some options and a footnote that we are going to use on all tables, the data will be sorted by REGION and STATE as the PRINT procedure does not support sorting and we are going to use a BY statement inside the PRINT procedure.

   ** First need to sort the data as PROC PRINT does not sort the data;
   proc sort data=cars0;
      by region state;
   run;

   ** Set some options -- don't put date in the report, and set first page number
      to 1, and set linesize to 64 characters;
   options nodate pageno=1 ls=64;

   ** Set a standard footnote for every output;
   footnote1 "Source: U.S. Department of Transportation, Federal Highway";
   footnote2 "Administration, Highway Statistics 2008 (Washington,";
   footnote3 "DC: 2009), tables MV-1 and MV-9";

Now that we have done the setup, the next job is to produce some output. The first output will be printing the data out by region and state, and summing the number of automobiles in each region, using the BYLINE option:

   === PROGRAM ===
   ** Lets produce the report in classic text form, using the BYLINE option;
   options byline;
   ** LABEL sets option so that dataset variable labels are
      used, UNIFORM sets the column width on each output
      group to be the same, NOOBS switches off printing of
      observation numbers, N sets first the label to use to
      output the number of observations in the group and the
      second label to show the number of observations in the
      entire dataset.;
   proc print data=cars0 label uniform noobs
              n="Number of States in Region: "
                "Total Number of States Analysed: ";
      var state cars;
      by region;
      ** For each region, sum the number of cars as a total at the bottom;
      sum cars;
      ** Now set the title of the report;
      title "Registered Automobiles, 2008";
   run;

   === OUTPUT (PARTIAL) ===
                     Registered Automobiles, 2008                 1

   ----------------- Region=Central Plains Region -----------------

                                            Number of
                                          Automobiles
                 State                        (000's)

                 Iowa                           1,786
                 Kansas                           876
                 Missouri                       2,613
                 Nebraska                         802
                 --------------------    ------------
                 REGION                         6,077

                    Number of States in Region: 4


   ------------------ Region=Great Lakes Region -------------------

                                            Number of
                                          Automobiles
                 State                        (000's)

                 Illinois                       5,704
                 Indiana                        3,110
                 Michigan                       4,337
                 Minnesota                      2,530
                 Ohio                           6,345
                 Wisconsin                      2,643
                 --------------------    ------------
                 REGION                        24,669

                    Number of States in Region: 6


   ------------------------ Region=Hawaii -------------------------

                                            Number of
                                          Automobiles
                 State                        (000's)

                 Hawaii                           480

                    Number of States in Region: 1


   ------------------ Region=Mid Atlantic Region ------------------

                                            Number of
                                          Automobiles
                 State                        (000's)

                 District of Columbia             167



      Source: U.S. Department of Transportation, Federal Highway
         Administration, Highway Statistics 2008 (Washington,
                   DC: 2009), tables MV-1 and MV-9

Using this approach the title is the same but has multiple regions per page.

Another approach is to replace the BYLINE option with a #BYVAL(variable) in the TITLE statement, as the following code will show:

   === PROGRAM ===
   ** Lets produce the report in classic text form without using a BYLINE;
   options nobyline;
   proc print data=cars0 label uniform noobs
              n="Number of States in Region: "
                "Total Number of States Analysed: ";
      var state cars;
      by region;
      sum cars;
      format cars comma8.;
      ** Using a #BYVAL option here so on each page the title will include
         the region name.;
      title "Registered Automobiles for #byval(region), 2008";
   run;

   === OUTPUT (PARTIAL) ===

     Registered Automobiles for Central Plains Region, 2008    5

                                       Number of
                                      Automobiles
              State                     (000's)

              Iowa                        1,786
              Kansas                        876
              Missouri                    2,613
              Nebraska                      802
              --------------------    -----------
              REGION                      6,077

                 Number of States in Region: 4

   Source: U.S. Department of Transportation, Federal Highway
      Administration, Highway Statistics 2008 (Washington,
                DC: 2009), tables MV-1 and MV-9


      Registered Automobiles for Great Lakes Region, 2008      6

                                       Number of
                                      Automobiles
              State                     (000's)

              Illinois                    5,704
              Indiana                     3,110
              Michigan                    4,337
              Minnesota                   2,530
              Ohio                        6,345
              Wisconsin                   2,643
              --------------------    -----------
              REGION                     24,669

                 Number of States in Region: 6

   Source: U.S. Department of Transportation, Federal Highway
      Administration, Highway Statistics 2008 (Washington,
                DC: 2009), tables MV-1 and MV-9


            Registered Automobiles for Hawaii, 2008            7

                                       Number of
                                      Automobiles
              State                     (000's)

              Hawaii                        480

                 Number of States in Region: 1

   Source: U.S. Department of Transportation, Federal Highway
      Administration, Highway Statistics 2008 (Washington,
                DC: 2009), tables MV-1 and MV-9

The output is one region per page, but the title is different because we have included the region name within the title using the #BYVAL statement.

Finally, lets look at what our table looks like on the Web if we produced the output using ODS HTML (default template):

   === PROGRAM ===
   ** Now lets produce the last report in HTML using ODS -- use the same
      code as above but just wrap it with a set of ODS HTML statements.;
   options nobyline;
   ODS html file="C:\TEMP\CARS2008.HTML";
     ** Important, define a location for the output in the first ODS HTML statement;
   proc print data=cars0 label uniform noobs
              n="Number of States in Region: "
                "Total Number of States Analysed: ";
      var state cars;
      by region;
      sum cars;
      format cars comma8.;
      title "Registered Automobiles for #byval(region), 2008";
   run;
   ODS html close;
   run;

   === OUTPUT (PARTIAL) ===

The last three months have been on the PRINT procedure and different ways you can use it to produce output. As you have briefly seen, PRINT can produce nice looking output and do some basic sums and counts of variables when programmed correctly.

Next month it is a new year and a new topic. Have a happy and safe December and see you in 2011!

________________________________
Updated December 2, 2010