Return to Archive

SAS Tip of the Month
October 2006

SAS provides several ways for a user to input data using a CARDS or DATALINES statement. In the examples the CARDS statement is used but the two statements for the most part can be used interchangeably. (The CARDS statement originates from the time when punch cards were used to store data.)

My favorite is known as the List Input, known for its simplicity. The structure for the INPUT statement using this method is to just list the variable names in the order for which the data appears - character variables are specified by using the '$' after the variable name. The following example shows the use of the List Input method where the data being input is the approximate distance from London to certain cities, in kilometers:

    data London2;
        length city $15 distance 8;
        input city $ distance;
        cards;
    Amsterdam 370
    Montreal 5200
    Auckland 22720
    ;
    run;

Note that in all the examples the length of each variable is defined using a LENGTH statement as this overrides any default that SAS may make - for character variables SAS will usually set the length from the value in the first record. (It is good programming practice to define a new variable in a data step with at least a LENGTH statement!)

There is one major problem with the INPUT statement as it is - SAS uses a single space as a delimiter between variables so how could this method be adapted to cater for the name of New York? Two methods exist, the first being the use of the DLM= (delimiter=) option in the INPUT statement as the following example shows:

    data London2;
        length city $15 distance 8;
        infile cards dlm='~';
        input city $ distance;
        cards;
    Paris~330
    New York~5530
    Sydney~215660
    ;
    run;

In the example the character '~' was used but it is possible to use just about any character as a delimiter, popular choices being "," and ";". Note that the DLM option will only work in the INFILE statement. (The INFILE statement is only used in this example as the DLM option is being set, otherwise SAS assumes that the "INFILE CARDS" statement is default from the CARDS statement.)

There is one other adaptation that I personally rarely use and that is to instruct SAS to use a "double space" as a delimiter, as the following example shows:

    data London2;
        length city $15 distance 8;
        input city $ & distance;
        cards;
    Frankfurt  640
    Rio de Janeiro  11060
    Singapore  10810
    ;
    run;

The next best method that I find useful is called the Named Input method which has the feature of allowing the data being input as defined in the INPUT statement to not necessarily be in the same order as in the data after the CARDS or DATALINES statement. The following example will show its use:

    data London2;
        length city $15 distance 8;
        input city= $ distance=;
        cards;
    city=Copenhagen distance=953
    distance=6800 city=Nairobi
    city=Tokyo distance=15260
    ;
    run;

Note that is this method the variable name is followed by the '=' sign in the INPUT and CARDS level.

The third method is the Column method where the columns where the SAS program will find the data is specified in the INPUT statement, as the following example shows:

    data London2;
        length city $15 distance 8;
        input city $ 1-11 distance 13-20;
        cards;
    Rome        1430
    Mexico City 10640
    Hong Kong   13200
    ;
    run;

In the example columns 1 through 11 are reserved for the city name and columns 13-20 are reserved for the distance from London to that city in kilometers.

The last method that data can be input using a CARDS or DATALINES statement is called the Formatted Input method. This is where the INPUT statement has the name of the input variable followed by an informat, as the following example shows:

    data London2;
        length city $15 distance 8;
        input city $11. @13 distance 5.;
        cards;
    Stockholm   1450
    Los Angeles 8780
    Bangkok     12860
    ;
    run;

The last example used what is called a "pointer" (the "@" symbol followed by a number) to instruct the SAS program to go to a particular column and start the input from that column.

It is possible to mix the methods inside an INPUT statement but use caution as unexpected results can easily occur.

I do hope you find this month's tip useful.

________________________________
Updated October 3, 2006