Stat 530 - Fall 2003 - SAS Templates

The Basics of SAS (in class 8/25/03)
Computer Trouble?
SAS Won't Start?
Graphs Printing Small in Word?


The Basics of SAS:

SAS's strong points are that it is perhaps the most widely used statistical package and that it also serves as a database management program. Its biggest weakness is that it is fairly hard to program or customize.

When you start SAS there are three windows that are used. The Log window, the Program Editor window, and the Output window. If you happen to lose one of these windows they usually have a bar at the bottom of the SAS window. You can also find them under the View menu.

The Program Editor is where you tell SAS what you want done. The Output window is where it puts the results, and the Log window is where it tells you what it did and if there are any errors. It is important to note that the Output window often gets very long! You usually want to copy the parts you want to print into MS-Word and print from there. It is also important to note that you should check the Log window everytime you run anything. (The error SAS Syntax Editor control is not installed is ok though.) The errors will appear in maroon. Successful runs appear in Blue.

Hitting the [F3] key will run the program currently in the Program Editor window.

This will however erase whatever was written in the Program Editor window. To recall whatever was there, make sure you are in that window, and hit the [F4] key.

If you happen to lose a window, check under the View menu at the top.

If you keep running more programs it will keep adding it all to the Output window. To clear the Output window, make sure you are in that window, and choose Clear All under the Edit menu.

In what follows, we will replicate as much of what we did with R as we can easily do.

We would enter the vector (5, 4, 3, 2) using code something like the following:


OPTIONS pagesize=60 linesize=60;

DATA sampvect;
INPUT values @@;
LABEL values = "Just some numbers";
CARDS;
5 4 3
2
;

Note that _most_ lines end with a semi-colon, but not all. SAS will crash if you miss one, but usually the log window will tell you where the problem is.

The OPTIONS line only needs to be used once during a session. It sets the length of the page and the length of the lines for viewing on the screen and printing. The font can be set by using the Options submenu of the Tools menu. When you cut and paste from SAS to a word processor, the font Courier New works well.

The DATA line defines what the name of the data set is. The name should be eight characters or less, with no spaces, and only letters, numbers, and underscores. It must start with a letter. The INPUT line gives the names of the variables, and they must be in the order that the data will be entered. The @@ at the end of the INPUT line means that the variables will be entered right after each other on the same line with no returns. (Instead of needing one row for each observation.)

If we hit F3 at this point to enter what we put above, nothing new will appear on the output screen. This is no big surprise however once we realize that we haven't told SAS to return any output! The code below simply tells SAS to print back out the data we entered.


PROC PRINT DATA=sampvect;
TITLE "Just the values";
RUN;

The most basic procedure to give out some actual graphs and statistics is PROC UNIVARIATE:


PROC UNIVARIATE DATA=sampvect PLOT FREQ ;
VAR values;
TITLE 'Summary of the Values';
RUN;

The VAR line says which of the variables you want a summary of. Also note that the graphs here are pretty awful. The INSIGHT procedure will do most of the things that the UNIVARIATE procedure will, and a lot more. INSIGHT however can not be programmed to perform new tasks that are not already built in. Later in the semester we'll see how some of the other procedures in SAS can be used to do things that aren't already programmed in.


PROC INSIGHT;
OPEN sampvect;
DIST values;
RUN;

Another way to open PROC INSIGHT is to go to the Solutions menu, then to the Analysis menu, and then finally to the Interactive Data Analysis option. Once there you will need to go to the WORK library, and choose the sampvect data set. If you go this route instead, you will need to also make a selection to get the information about the distribution of female salaries. Go to the Analyze menu, and choose Distribution(Y). Select values, click the Y button, and then click OK.

Once PROC INSIGHT opens, you can cut and paste the graphs from PROC INSIGHT right into Microsoft Word. Simply click on the border of the box you want to copy with the right mouse button to select it. You can then cut and paste like normal. Clicking on the arrow in the bottom corner of each of the boxes gives you options for adjusting the graphs format. The various menus along the top also give other choices such as adding QQplots or conducting test of hypotheses. To quit PROC INSIGHT, click on the X in the upper right portion of the spreadsheet.

There are various ways to enter "more exciting" data sets. The Import Data... option in the File menu will allow you to read in text files and spreadsheets. It is also possible to simply cut and paste data into SAS. Open up the web page http://www.stat.sc.edu/~habing/courses/data/rivers.txt, select the entire page, and paste it into the Program Editor window.

We must now add the lines around it: DATA, INPUT, CARDS, and the final ;. We will use the first line of headings as the INPUT line. We will not need the @@ at the end, but we will need to put a $ after the River, Country, and Continent. This is to indicate that these are character strings and not numeric values. The first three lines will thus need to look something like:

DATA nitro;
INPUT River $     Country  $  Cont $   Discharge      Runoff      Area      Density      NO3      Export      Dep      Nprec      Prec;
CARDS;
(You might need to remove some blank spaces when you add the $ signs, because the line will be too long otherwise.)

You can now use PROC PRINT to make sure the data was accepted correctly. Note that it truncated the names of the various variables that were extremely long before.

If we wanted to look at the various continents separately, and only for some of the variables, we could form new data sets with just the portions we want. The following would keep the Discharge, Density, and N03 for Europe only.


DATA Eunitro;
SET nitro;
KEEP Discharge Density N03;
WHERE Cont='Eu';
RUN;

PROC PRINT DATA=EuNitro;
TITLE "So, did it work?";
RUN;

Whenever you have a DATA line, that means you are creating a new dataset with that name. The SET line tells it that we are making this new data set from an old one. The KEEP line says the only variables we want in this new data set are the ones on that line. The lines after that say any special commands that go into the making of the new data set. In this case the WHERE command is used to make sure we only observations that had specific values for some variables. We could also make datasets that involve using mathematical functions of the variables already there. In any case, it should be pretty straight-forward when you just stop and read through what the lines say.

Two things to notice about the above. First, it was not case sensitive (EuNitro vs. Eunitro). Second, what happened to the NO3??

If we plan on using PROC INSIGHT, there was really no need to do the subsetting here. We can choose to ignore certain values once we are in INSIGHT. So, now, start INSIGHT up with the entire nitro data set.

Under the Analyze menu, choose Scatter Plot and select N03 for Y and Density for X. Also, choose Fit (YX) in that same menu.

Now, choose Edit, Observations, will allow you to include or exclude observations from the graphs and plots. Select Exclude from Calculations for Cont ^= Eu. Notice how all of the numbers have been recalculated, and the non-European rivers are shown with X's. We could also remove those observations from the plots entirely. Also notice the change on the spreadsheet that occured from each of these. By right-clicking on an observation number, you can make the choice for that observation individually.

Now, take a moment and reinclude all of the observations.

If you construct a scatterplot for X=Density, Y=NO3, and Group=Cont, you will get a scatter plot for each continent separately.

Try Box Plot/Mosacic Plot (Y) for Y=N03, with Group=Cont. Now try it with Y=NO3 and X=Density.

Remember, the arrows in the lower left give you various options with the graphs you have constructed. You can also change a value in the spreadsheet to see how that affects the display. Clicking on an observation in the spreadsheet will highlight that observation in the graphs, and vice-versa, as well.

Try a Rotating Plot (ZYX) using three quantitative variables. Similarly for Contour Plot (ZYX). Finally, choose Scatter Plot but select ALL of the quantitative variables for X and also for Y.

As you can see, PROC INSIGHT has lots of nice built in graphing procedures. Unfortunately it cannot be customized beyond what it has programmed in to start. The other graphical routines in SAS are often not as easy to use. PROC GPLOT and GCHART do allow for some of the same control as in S-Plus however. A list of these, and other graphical functions, can be found in the SAS help for SAS/GRAPH. The basic statistical procedures are listed under SAS/STAT, and the help for INSIGHT is listed under SAS/INSIGHT. When you call up the help, it will generally take over whatever web-browser window was on top, and use that to display the help files.

There are also several other graphical procedures that our added in each new version of SAS... and they have progressively become more user friendly with time.

PROC G3D DATA=nitro;
SCATTER Density*Discharge=NO3;
TITLE1 'Better Still!';
RUN;


Computer Trouble?

In most cases, help with the computers (NOT the programming) can be gained by e-mailing help@stat.sc.edu

For the printers on the first and second floor, printer paper is available in the Stat Department office. For printers on the third floor, paper is available in the Math Department office.

If you are using a PC restarting the machine will fix many problems, but obviously don't try that if you have a file that won't save or the like.

If SAS won't start, one of the things to check is that your computer has loaded the X drive correctly (whatever that means). Go to My Computer and see if the apps on 'lc-nt' (X:) is listed as one of the drives. If it isn't, go to the Tools menu and select Map Network Drive.... Select X for the drive, and enter \\lc-nt\apps for the Folder. Then click Finish. This should connect your computer to the X drive and allow SAS to run. If you already had the X-drive connected, then you will need to e-mail help@stat.sc.edu.

If your graphs print out extremely small after you copy them to word, you might be able to fix the problem by "opening and closing" the image. In word, left click once on the image, and select Edit Picture or Open Picture Object under the Edit menu. A separate window will open with the image in it. Simply choose Close Picture. It should now print out ok. This will also make the spacing between the characters in the labels look right if they were somewhat off.

If the problem is an emergency requiring immediate attention see Jason Dew in room 209D.
If neither Jason is not available and it is an emergency see Minna Moore in room 417.
Flagrantly non-emergency cases may result in suspension of computer privileges.