Return to Homepage

Goto the Tip of the Month Archive

Other interesting pages ...
LinkedIn Profile
SAS Cheat Sheet
Useful SAS Code
Full SAS Example
Basic Statistics
Contact Information

SAS Tip of the Month
March 2015
(for SAS)

This month I am going to talk about dataset variable labels, and why they are useful.

There are many parts to a variable, key among them are the name of the variable (NAME), type (TYPE), and length (LENGTH). Under SAS version 8 and above the name of a variable can be 32 characters long, its type can be either Character or Numeric, and the length for a character variable can be set between 1 and 32767 characters while a numeric variable can be set between 3 and 8 (yes, some operating systems allow for 2 but not all).

There are other parts to a variable, but the three most common are the informat, format and label -- we shall look at this last part this month.

What exactly is a label and why is it used? In simple terms the label is a short descriptive text giving a more user friendly description of the variable. Take for example a very simple example, the variable WEIGHT in a dataset -- sure it tells me what the variable is (it is weight) but it does not tell me other things I may need to know like what unit it is in or when the weight data point was collected, all of this being very important information. Given that I am allowed 32 characters for a variable name I could change the variable name to reflect this information, but in most cases it is not possible to do this and it indeed impractical. So here comes the use of the LABEL statement.

The label itself can be 256 characters, including blanks, in length (40 characters if you are still using SAS version 6.xx) -- this gives us plenty of room to write a good description of the variable. Setting the label is commonly achieved by using the LABEL statement inside a datastep or the LABEL option under the MODIFY statement in the DATASETS procedure, the syntax of which is given below:

   data class; *** Include a dataset; 
      set class;
      label weight='Weight (kg) at Start of Study';
   run;

   proc datasets library=work; *** DATASETS procedure;
      modify class;
         label  weight='Weight (kg) at Start of Study';
      quit;
   run;

I personally use the datastep method if I am creating or modifying a variable inside that datastep, otherwise I use the DATASETS procedure, but is only my convention -- there is no set rule with this. Before going on to a real world example, lets first see how we would look at what labels are set, if any, inside a dataset.

The two easiest ways to look at the data is to either run a CONTENTS procedure call on the dataset, or if you have the SAS Viewer installed, look at the attributes window for that dataset.

Now lets look at a real world example where it will all come clearer. For this example I will use the dataset SASHELP.CLASS. Lets look first at the structure of the dataset using the CONTENTS procedure (will actually make a copy first so I don't overwrite the original data):

   data class; *** Make a copy of the dataset;
      set sashelp.class;
   run;
   proc contents data=class;
     *** Get structure of dataset;
   run;

Running this code we get the following output (abridged):

   Alphabetic List of Variables and Attributes

   #  Variable  Type  Len
 
   3  Age       Num     8
   4  Height    Num     8
   1  Name      Char    8 
   2  Sex       Char    1
   5  Weight    Num.    8

If a label existed for a variable we would see a column headed "Label", but in this case there is no labels applied to the dataset. So now lets set one for WEIGHT as we indicated above (this time I shall do the content structure not from the CONTENTS procedure, but the CONTENTS statement inside the DATASETS procedure):

   proc datasets library=work;
      modify class;
        label weight='Weight (kg) at Start of Study';
      contents data=class;
   quit;
   run;

Running this code we get the following output (abridged):

   Alphabetic List of Variables and Attributes
 
   #  Variable  Type  Len  Label
   3  Age       Num     8
   4  Height    Num     8
   1  Name      Char    8
   2  Sex       Char    1
   5  Weight    Num     8  Weight (kg) at Start of Study

As you can see the variable WEIGHT now has a label -- the reason why it is useful will become clear shortly. For the purposes of this example I will also now add a label to the variable AGE and HEIGHT using the DATASETS procedure as above:

   proc datasets library=work;
      modify class;
        label height='Height (in) at Start of Study'
              age='Age (years) at Start of Study';
      contents data=class;
   quit;
   run;

Note that I did not redo the label for the variable WEIGHT in the above code as it was already done previously, although I could have put in the step and even replaced it with new text. Now why is the label useful. As you can see already it gives a useful description of what the variable is. Now lets extend that to a small report using the PRINT procedure:

   proc print data=class noobs;
   run;

that produces the following output (abridged):

   Name     Sex  Age  Height  Weight
   Alfred   M     14   69.0    112.5
   Alice    F     13   56.5     84.0
   Barbara  F     13   65.3     98.0
   Carol    F     14   62.8    102.5
   Henry    M     14   63.5    102.5

The column headers are the same as the variable names, and we have the same problem of not knowing what the units are for Age, Height and Weight (but did we not put that in a label earlier) but lets add another option to get the labels:

   proc print data=class noobs LABEL;
   run;

that produces the following output (abridged):

                 Age (years)  Height (in)  Weight (kg)
                  at Start     at Start     at Start
   Name     Sex   of Study     of Study     of Study

   Alfred    M       14          69.0         112.5
   Alice     F       13          56.5          84.0
   Barbara   F       13          65.3          98.0
   Carol     F       14          62.8         102.5
   Henry     M       14          63.5         102.5

Now, as you can see, we have a report that gives a clear description of the variables though the use of dataset variable labels.

There will be those reading this who will say that I could have put the label in the PRINT procedure call (yes, the LABEL statement is a global statement that can be used in almost any procedure) and would be done this way using the following code:

   proc print data=class noobs LABEL;
      label weight='Weight (kg) at Start of Study'
            height='Height (in) at Start of Study'
            age='Age (years) at Start of Study';
   run;

But the one reason I don't normally use this method is that the label does not carry forward in the dataset. Now see a complete example where the CLASS data is copied from the SASHELP directory, useful labels put on the dataset, then doing a PRINT, MEANS and TABULATE call on the same data, all with the same labels:

   proc datasets library=work;
      copy in=sashelp out=work;
         select class;
      modify class;
      label weight='Weight (kg) at Start of Study'
            height='Height (in) at Start of Study'
            age='Age (years) at Start of Study';
      quit;
   run;

   proc print data=class noobs LABEL;
   run;

   proc means data=class;
      var age height weight;
   run;

   proc tabulate data=class;
      class sex;
      var age height weight;
      tables age*(n*f=8.0 mean*f=8.3 std*f=8.4
                  median*f=8.2 min*f=8.0 max*f=8.0)
             (height weight)*(n*f=8.0 mean*f=8.2 std*f=8.3
                              median*f=8.3 min*f=8.1 max*f=8.1), 
             sex all='Total';
   run;

produces the following output (abridged):

   The PRINT Procedure

                  Age (years)  Height (in)  Weight (kg)
                   at Start     at Start     at Start
   Name     Sex    of Study     of Study     of Study

   Alfred    M        14          69.0         112.5
   Alice     F        13          56.5          84.0
   Barbara   F        13          65.3          98.0


   The MEANS Procedure

   Var     Label                           N  Mean  STD  Min  Max

   Age     Age (years) at Start of Study  19    13    1   11   16
   Height  Height (in) at Start of Study  19    62    5   51   72
   Weight  Weight (kg) at Start of Study  19   100   23   51  150


   The TABULATE Procedure
 
   ----------------------------------------------------
   |                       |       Sex       |        |
   |                       |-----------------|        |
   |                       |   F    |   M    | Total  |
   |-----------------------+--------+--------+--------|
   |Age (years)|N          |       9|      10|      19|
   |at Start of|-----------+--------+--------+--------|
   |Study      |Mean       |   13.22|   13.40|   13.32|
   |           |-----------+--------+--------+--------|
   |           |Std.       |   1.394|   1.647|   1.493|
   |           |-----------+--------+--------+--------|
   |           |Median.    |   13.00|   13.50|   13.00|
   |           |-----------+--------+--------+--------|
   |           |Min.       |      11|      11|      11|
   |           |-----------+--------+--------+--------|
   |           |Max        |      15|      16|      16|
   |-----------+-----------+--------+--------+--------|
   |Height (in)|N          |       9|      10|      19|
   |at Start of|-----------+--------+--------+--------|
   |Study.     |Mean       |  60.589|  63.910|  62.337|

Note that the label we defined in the DATASETS procedure carried forward to the PRINT, MEANS and TABULATE procedures.

Hope this was useful.

________________________________
Updated March 2, 2015