SAS Tip of the Month
Getting data into SAS from an Excel speadsheet is a task that is often carried out by the SAS Programmer. One reason is that Excel is a good package for entering and editing data and does have some audit capabilites. But how do we get the data on the spreadsheet to a SAS dataset?
Let first look at some data on a sheet called LongRivers in a file called Rivers.xls located in the directory C:\TEMP:
NAME LENGTH_KMS Nile River 6,695 Amazon River 6,516 Yangtze River 6,380 Mississippi-Missouri River System 5,969 Ob-Irtysh Rivers 5,568 Yenisey-Angara-Selenga Rivers 5,550 Huang He (Yellow River) 5,464 Congo River 4,667 Rio de la Plata-Parana 4,500 Mekong River 4,425
As can be seen the structure for this sheet is two columns, no merged cells, and the column names are in row 1. This is as basic as they come but is the general structure for most sheets that hold data.
We can certainly save the sheet as a CSV file and import that file, but that is an extra step -- where possible it is best to try and import the data directly using PROC IMPORT and the EXCEL option that is available inside the procedure. The basic syntax for the use of the procedure is:
PROC IMPORT DATAFILE="filename" OUT=dataset_name DBMS=EXCEL REPLACE; SHEET="sheet_name$"; GETNAMES=YES|NO; RUN;
The DBMS=EXCEL option tells SAS that the data coming in is from an Excel spreadsheet, while the REPLACE option is necessary if you want to overwrite an existing dataset (too many imports of data have failed because the programmer assumed any dataset of the same name would be automatically overwrittern when it is not!). The GETNAMES option is a YES or NO value and gives the choice for the user to get the names of the variables on the first row -- if YES, any blanks are converted to '_', and if NO then the variable names become F1, F2, etc.
There are other statements that can be used including MIXED, RANGE, SCANTEXT, USEDATA and SCANTIME but these are beyond the example used here -- consult the SAS User Documentation for the use of these statements.
With this knowledge, lets import the data from the Excel file:
PROC IMPORT DATAFILE="C:\TEMP\Rivers.xls" OUT=longrivers DBMS=EXCEL REPLACE; SHEET="LongRivers$"; GETNAMES=YES; RUN;
The data we imported is ready for use.
It is possible to do a quick check on the data by running the following:
title1 "Listing of Ten Longest Rivers in the World"; PROC PRINT DATA=longrivers; RUN;
and we get the following output:
Listing of Ten Longest Rivers in the World LENGTH_ Obs NAME KMS 1 Nile River 6695 2 Amazon River 6516 3 Yangtze River 6380 4 Mississippi-Missouri River System 5969 5 Ob-Irtysh Rivers 5568 6 Yenisey-Angara-Selenga Rivers 5550 7 Huang He (Yellow River) 5464 8 Congo River 4667 9 Rio de la Plata-Parana 4500 10 Mekong River 4425
I hope this has been helpful. See you in September!
Updated August 2, 2013