Return to Archive

SAS Tip of the Month
May 2009

This month I am going to talk about dataset variable labels, and why they are useful.

There are many parts to a variable, key among them are the name of the variable (NAME), type (TYPE), and length (LENGTH). Under SAS version 8 and above the name of a variable can be 32 characters long, its type can be either Character or Numeric, and the length for a character variable can be set between 1 and 32767 characters while a numeric variable can be set between 3 and 8 (yes, some operating systems allow for 2 but not all).

There are other parts to a variable, but the three most common are the informat, format and label -- we shall look at this last part this month.

What exactly is a label and why is it used? In simple terms the label is a short descriptive text giving a more user friendly description of the variable. Take for example a very simple example, the variable WEIGHT in a dataset -- sure it tells me what the variable is (it is weight) but it does not tell me other things I may need to know like what unit it is in or when the weight data point was collected, all of this being very important information. Given that I am allowed 32 characters for a variable name I could change the variable name to reflect this information, but in most cases it is not possible to do this and it indeed impractical. So here comes the use of the LABEL statement.

The label itself can be 256 characters, including blanks, in length (40 characters if you are still using SAS version 6.xx) -- this gives us plenty of room to write a good description of the variable. Setting the label is commonly achieved by using the LABEL statement inside a datastep or the LABEL option under the MODIFY statement in the DATASETS procedure, the syntax of which is given below:

   *** Inside a datastep;
   data class;
      set class;
      label weight='Weight (kg) at Start of Study';
   run;
   
   *** Inside the DATASETS procedure;
   proc datasets library=work;
      modify class;
         label weight='Weight (kg) at Start of Study';
      quit;
   run;    

I personally use the datastep method if I am creating or modifying a variable inside that datastep, otherwise I use the DATASETS procedure, but is only my convention -- there is no set rule with this.

Before going on to a real world example, lets first see how we would look at what labels are set, if any, inside a dataset.

The two easiest ways to look at the data is to either run a CONTENTS procedure call on the dataset, or if you have the SAS Viewer installed, look at the attributes window for that dataset.

Now lets look at a real world example where it will all come clearer. For this example I will use the dataset SASHELP.CLASS. Lets look first at the structure of the dataset using the CONTENTS procedure (will actually make a copy first so I don't overwrite the original data):

   data class;  *** Make a copy of the dataset;
      set sashelp.class;
   run;
   proc contents data=class;  *** Get structure of the dataset;
   run;    

Running this code we get the following output (abridged):

   Alphabetic List of Variables and Attributes
   
   #    Variable    Type    Len
   
   3    Age         Num       8
   4    Height      Num       8
   1    Name        Char      8
   2    Sex         Char      1
   5    Weight      Num       8    

If a label existed for a variable we would see a column headed "Label", but in this case there is no labels applied to the dataset. So now lets set one for WEIGHT as we indicated above (this time I shall do the content structure not from the CONTENTS procedure, but the CONTENTS statement inside the DATASETS procedure):

   proc datasets library=work;
      modify class;
         label weight='Weight (kg) at Start of Study';
      contents data=class;
      quit;
   run;    

Running this code we get the following output (abridged):

         Alphabetic List of Variables and Attributes

#    Variable    Type    Len    Label

3    Age         Num       8
4    Height      Num       8
1    Name        Char      8
2    Sex         Char      1
5    Weight      Num       8    Weight (kg) at Start of Study  

As you can see the variable WEIGHT now has a label -- the reason why it is useful will become clear shortly. For the purposes of this example I will also now add a label to the variable AGE and HEIGHT using the DATASETS procedure as above:

   proc datasets library=work;
      modify class;
         label height='Height (in) at Start of Study'
               age='Age (years) at Start of Study';
      contents data=class;
      quit;
   run;  

Note that I did not redo the label for the variable WEIGHT in the above code as it was already done previously, although I could have put in the step and even replaced it with new text.

Now why is the label useful. As you can see already it gives a useful description of what the variable is. Now lets extend that to a small report using the PRINT procedure:

   proc print data=class noobs;
   run;    

that produces the following output (abridged):

    Name      Sex    Age    Height    Weight
   
   Alfred      M      14     69.0      112.5
   Alice       F      13     56.5       84.0
   Barbara     F      13     65.3       98.0
   Carol       F      14     62.8      102.5
   Henry       M      14     63.5      102.5    

The column headers are the same as the variable names, and we have the same problem of not knowing what the units are for Age, Height and Weight (but did we not put that in a label earlier) but lets add another option to get the labels:

   proc print data=class noobs LABEL;
   run;    

that produces the following output (abridged):

                     Age (years)    Height (in)    Weight (kg)
                       at Start       at Start       at Start
    Name      Sex      of Study       of Study       of Study
   
   Alfred      M          14            69.0          112.5
   Alice       F          13            56.5           84.0
   Barbara     F          13            65.3           98.0
   Carol       F          14            62.8          102.5
   Henry       M          14            63.5          102.5    

Now, as you can see, we have a report that gives a clear description of the variables though the use of dataset variable labels.

There will be those reading this who will say that I could have put the label in the PRINT procedure call (yes, the LABEL statement is a global statement that can be used in almost any procedure) and would be done this way using the following code:

   proc print data=class noobs LABEL;
         label weight='Weight (kg) at Start of Study'
               height='Height (in) at Start of Study'
               age='Age (years) at Start of Study';
   run;    

But the one reason I don't normally use this method is that the label does not carry forward in the dataset. Now see a complete example where the CLASS data is copied from the SASHELP directory, useful labels put on the dataset, then doing a PRINT, MEANS and TABULATE call on the same data, all with the same labels:

   proc datasets library=work;
      copy in=sashelp out=work;
         select class;
      modify class;
         label weight='Weight (kg) at Start of Study'
               height='Height (in) at Start of Study'
               age='Age (years) at Start of Study';
      quit;
   run;
   proc print data=class noobs LABEL;
   run;
   proc means data=class;
      var age height weight;
   run;
   proc tabulate data=class;
      class sex;
      var age height weight;
      tables age*
               (n*f=8.0 mean*f=8.2 std*f=8.3
                median*f=8.2 min*f=8.0 max*f=8.0)
             (height weight)*
               (n*f=8.0 mean*f=8.3 std*f=8.4
                median*f=8.3 min*f=8.1 max*f=8.1),
             sex all='Total';
   run;    

produces the following output:

The PRINT Procedure

                  Age (years)    Height (in)    Weight (kg)
                    at Start       at Start       at Start
Name       Sex      of Study       of Study       of Study

Alfred      M          14            69.0          112.5
Alice       F          13            56.5           84.0
Barbara     F          13            65.3           98.0
Carol       F          14            62.8          102.5
Henry       M          14            63.5          102.5
James       M          12            57.3           83.0
Jane        F          12            59.8           84.5
Janet       F          15            62.5          112.5
Jeffrey     M          13            62.5           84.0
John        M          12            59.0           99.5
Joyce       F          11            51.3           50.5
Judy        F          14            64.3           90.0
Louise      F          12            56.3           77.0
Mary        F          15            66.5          112.0
Philip      M          16            72.0          150.0
Robert      M          12            64.8          128.0
Ronald      M          15            67.0          133.0
Thomas      M          11            57.5           85.0
William     M          15            66.5          112.0


The MEANS Procedure

Variable  Label                           N          Mean       Std Dev       Minimum       Maximum
---------------------------------------------------------------------------------------------------
Age       Age (years) at Start of Study  19    13.3157895     1.4926722    11.0000000    16.0000000
Height    Height (in) at Start of Study  19    62.3368421     5.1270752    51.3000000    72.0000000
Weight    Weight (kg) at Start of Study  19   100.0263158    22.7739335    50.5000000   150.0000000
---------------------------------------------------------------------------------------------------


The TABULATE Procedure

----------------------------------------------------
|                       |       Sex       |        |
|                       |-----------------|        |
|                       |   F    |   M    | Total  |
|-----------------------+--------+--------+--------|
|Age (years)|N          |       9|      10|      19|
|at Start of|-----------+--------+--------+--------|
|Study      |Mean       |   13.22|   13.40|   13.32|
|           |-----------+--------+--------+--------|
|           |Std        |   1.394|   1.647|   1.493|
|           |-----------+--------+--------+--------|
|           |Median     |   13.00|   13.50|   13.00|
|           |-----------+--------+--------+--------|
|           |Min        |      11|      11|      11|
|           |-----------+--------+--------+--------|
|           |Max        |      15|      16|      16|
|-----------+-----------+--------+--------+--------|
|Height (in)|N          |       9|      10|      19|
|at Start of|-----------+--------+--------+--------|
|Study      |Mean       |  60.589|  63.910|  62.337|
|           |-----------+--------+--------+--------|
|           |Std        |  5.0183|  4.9379|  5.1271|
|           |-----------+--------+--------+--------|
|           |Median     |  62.500|  64.150|  62.800|
|           |-----------+--------+--------+--------|
|           |Min        |    51.3|    57.3|    51.3|
|           |-----------+--------+--------+--------|
|           |Max        |    66.5|    72.0|    72.0|
|-----------+-----------+--------+--------+--------|
|Weight (kg)|N          |       9|      10|      19|
|at Start of|-----------+--------+--------+--------|
|Study      |Mean       |  90.111| 108.950| 100.026|
|           |-----------+--------+--------+--------|
|           |Std        | 19.3839| 22.7272| 22.7739|
|           |-----------+--------+--------+--------|
|           |Median     |  90.000| 107.250|  99.500|
|           |-----------+--------+--------+--------|
|           |Min        |    50.5|    83.0|    50.5|
|           |-----------+--------+--------+--------|
|           |Max        |   112.5|   150.0|   150.0|
----------------------------------------------------    

Note that the label we defined in the DATASETS procedure carried forward to the PRINT, MEANS and TABULATE procedures.

I hope you find this months tip useful.

________________________________
Updated May 8, 2009