Return to Archive

SAS Tip of the Month
January 2006

Merging variables from one dataset into another is one of the basic data manipulation tasks that a SAS programmer has to do. The most common way to merge on data is using the MERGE statement in the DATA step but there are three other ways that can help. First though, some data:

    Dataset: PATDATA
    SUBJECT  TRT_CODE
    124263      A
    124264      A
    124266      B

    Dataset: ADVERSE
    SUBJECT  EVENT
    124263   HEADACHE
    124266   FEVER
    124266   NAUSEA

Each of the examples below will merge the TRT_CODE from the PATDATA dataset onto the ADVERSE dataset.

Using the DATA step the code for merging would typically be:

    DATA alldata0;
        MERGE adverse (in=a) patdata (in=b);
        BY subject;
        IF a;
    RUN;

Another way would be to use SQL, so the code would look something like:

    PROC SQL;
        CREATE TABLE alldata0 AS
            SELECT a.*, b.trt_code
            FROM adverse a left join patdata b
            ON a.subject=b.subject;
        QUIT;
    RUN;

A third way is using the KEY= option in the SET statement as shown in the following example:

    DATA alldata0;
        SET adverse;
        SET patdata KEY=subject /UNIQUE;
        IF _IORC_ THEN DO;
            _ERROR_=0;
            trt_code='';
        END;
    RUN;

Before the third example is run the dataset PATDATA must have an index created inside it, using either the INDEX statement inside a DATASETS or SQL procedure, or INDEX option inside a DATA step.

The forth example creates a format from the data and sets the treatment from the created format:

    DATA fmt;
        RETAIN fmtname 'TRT_FMT' type 'C';
        SET patdata;
        RENAME subject=start trt_code=label;
    RUN;
    PROC FORMAT CNTLIN=fmt;
    RUN;
    DATA alldata0;
        SET adverse;
        trt_code=put(subject,$trt_fmt.);
    RUN;

There are other ways of merging data but the above examples show the most common. No one method is better than the other - it depends on the size of the data being merged. The following table gives a guide on the ratio of time spent when compared with a number of observations in the ADVERSE dataset (PATDATA has 1000 patients/records):

     Number of   |   MERGE   |          |             |  PROC
    Observations | Statement | PROC SQL | KEY= Option | FORMAT
    -------------+-----------+----------+-------------+--------
         1K      |    1.00   |    2.19  |     0.25    |   1.25
         1M      |   86.25   |   90.56  |    51.25    |  23.06
         5M      |  787.94   |  402.19  |   302.31    | 264.13

Again, the above table should only be given as a guide. Happy New Year.

________________________________
Updated January 5, 2006