The Program Editor is where you tell SAS what you want done. The Output window is where it puts the results, and the Log window is where it tells you what it did and if there are any errors. It is important to note that the Output window often gets very long! You usually want to copy the parts you want to print into MS-Word and print from there. It is also important to note that you should check the Log window everytime you run anything. (The error SAS Syntax Editor control is not installed is ok though.) The errors will appear in maroon. Successful runs appear in Blue.
Hitting the [F3] key will run the program currently in the Program Editor window.
This will however erase whatever was written in the Program Editor window. To recall whatever was there, make sure you are in that window, and hit the [F4] key.
If you happen to lose a window, check under the View menu at the top.
If you keep running more programs it will keep adding it all to the Output window. To clear the Output window, make sure you are in that window, and choose Clear All under the Edit menu.
The code below uses the Texas employment data we saw in class on Tuesday.
OPTIONS pagesize=60 linesize=80; DATA salary; INPUT BSAL SAL77 SEX $ SENIOR AGE EDUC EXPER @@; CARDS; 5100 8940 1 95 640 15 165 6300 10860 1 84 662 15 231 4800 8580 1 98 774 12 381 6000 9720 1 69 488 12 121 5280 8760 1 98 557 8 190 5100 9600 1 85 406 12 59 5280 8040 1 88 745 8 90 4800 11100 1 87 349 12 11 4800 9000 1 77 505 12 63 5100 10020 1 87 508 16 123 4800 8820 1 76 482 12 6 5700 9780 1 74 542 12 116.5 5400 13320 1 86 329 15 24 5400 10440 1 72 604 12 169 5520 9600 1 82 558 12 97 5100 10560 1 84 458 12 36 5400 8940 1 88 338 12 26 4800 9240 1 84 571 16 214 5700 9000 1 76 667 12 90 6000 11940 1 86 486 15 78.5 3900 8760 1 98 327 12 0 4380 10020 1 93 313 8 7.5 4800 9780 1 75 619 12 144 5580 7860 1 69 600 12 132.5 6120 9360 1 78 624 12 208.5 4620 9420 1 96 385 12 52 5220 7860 1 70 671 8 102 5220 8340 1 70 468 12 127 5100 9660 1 66 554 8 96 5040 12420 0 96 329 15 14 4380 9600 1 92 305 8 6.25 6300 12060 0 82 357 15 72 4290 9180 1 69 280 12 5 6000 15120 0 67 315 15 35.5 5400 9540 1 66 534 15 122 6000 16320 0 97 354 12 24 4380 10380 1 92 305 12 0 6000 12300 0 66 351 12 56 5400 8640 1 65 603 8 173 6840 10380 0 92 374 15 41.5 5400 11880 1 66 302 12 26 8100 13979 0 66 369 16 54.5 4500 12540 1 96 366 8 52 6000 10140 0 82 363 12 32 5400 8400 1 70 628 12 82 6000 12360 0 88 555 12 252 5520 8880 1 67 694 12 196 6900 10920 0 75 416 15 132 5640 10080 1 90 368 12 55 6900 10920 0 89 481 12 175 4800 9240 1 73 590 12 228 5400 12660 0 91 331 15 17.5 5400 8640 1 66 771 8 228 6000 12960 0 66 355 15 64 4500 7980 1 80 298 12 8 6000 12360 0 86 348 15 25 5400 11940 1 77 325 12 38 5400 10680 0 88 359 12 38 5400 9420 1 72 589 15 49 5400 11640 0 96 474 12 113 6300 9780 1 66 394 12 86.5 5100 7860 0 84 535 12 180 5160 10680 1 87 320 12 18 6600 11220 0 66 369 15 84 5100 11160 1 98 571 15 115 5100 8700 0 97 637 12 315 4800 8340 1 79 602 8 70 6600 12240 0 83 536 15 215.5 5400 9600 1 98 568 12 244 5700 11220 0 94 392 15 36 4020 9840 1 92 528 10 44 6000 12180 0 91 364 12 49 4980 8700 1 74 718 8 318 6000 11580 0 83 521 15 108 5280 9780 1 88 653 12 107 6000 8940 0 80 686 12 272 5700 8280 1 65 714 15 241 6000 10680 0 87 364 15 56 4800 8340 1 87 647 12 163 4620 11100 0 77 293 12 11.5 4800 13560 1 82 338 12 11 5220 10080 0 85 344 12 29 5700 10260 1 82 362 15 51 6600 15360 0 83 340 15 64 4380 9720 1 93 303 12 4.5 5400 12600 0 78 305 12 7 4380 10500 1 89 310 12 0 6000 8940 0 78 659 8 320 5700 10620 1 88 410 15 61 5400 9480 0 88 690 15 359 5400 10320 1 78 584 15 51 6000 14400 0 96 402 16 45.5 4440 9600 1 97 341 15 75 ;
Note that _most_ lines end with a semi-colon, but not all. SAS will crash if you miss one, but usually the log window will tell you where the problem is.
The OPTIONS line only needs to be used once during a session. It sets the length of the page and the length of the lines for viewing on the screen and printing. The font can be set by using the Options choice under the Tools menu along the top of the screen. When you cut and paste from SAS to a word processor, the font Courier New works well.
The DATA line defines what the name of the data set is. The name may not have any spaces; only letters, numbers, and underscores. It must start with a letter. In older versions of SAS the name must be eight characters long or less. The INPUT line gives the names of the variables, and they must be in the order that the data will be entered. The @@ at the end of the line says that there may be more than one observatin per line. If we had left it out then it would have skipped from the observation with a 5100 base salary right to the one with a 4800 base salary. If we had put a $ after sex says that that variable is a category/name and not a number.
If we hit F3 at this point to run what we put above, nothing new will appear on the output screen. This should be no big surprise, once we realize that we haven't told SAS to return any output! The code below simply tells SAS to print back out the data we entered.
PROC PRINT DATA=salary;
TITLE "The Salary Data";
RUN;
The most basic method for getting a summary of the data is to use PROC UNIVARIATE.
PROC UNIVARIATE DATA=salary PLOT FREQ ;
VAR sal77;
TITLE 'Summary The 1977 Salaries';
RUN;
The VAR line says which of the variables you want a summary of. Also note that the graphs here are pretty awful. We'll see in a few minutes how to make the graphics look better.
The code used to generate the PROC TTEST output on the hand out is:
PROC TTEST DATA=salary;
CLASS sex;
VAR sal77;
RUN;
One way to get a nicer q-q plot (normal probability plot) than the one that PROC UNIVARIATE makes (and to get a separate one for each sex) we could use PROC INSIGHT.
PROC INSIGHT;Another way to open PROC INSIGHT is to go to the Solutions menu, then to the Analysis menu, and then finally to the Ineteractive Data Analysis option. Once there you will need to go to the WORK library, and choose the salary data set.
OPEN salary;
RUN;
Once you start PROC INSIGHT a spreadsheet containing the data should appear on the screen. To analyze this data, go to the Analyze menu, and choose Distribution (Y). In the white box under SALARY select sal77 and click the Y button. Then select sex and click the Group button. Finally, click OK.
This causes an output window with two sets of graphs side-by-side to appear. You can cut and paste the graphs from these windows right into microsoft word. Simply click on the border of the box you want to copy with the left mouse button to select it. You can then cut and paste like normal. Later, to quit PROC INSIGHT, you will simply click on the X in the upper right portion of the spreadsheet.
The plot to check for normality is found under the Curves menu, choose QQ Ref Line... and then just click ok in the box that pops up. If the data is approximately normally distributed then it should fall near the line. With so few data points however, it is hard to tell if the data looks close or not. We can conduct an hypothesis test of normality by selecting Tests for Normality under the Tables menu. The Anderson-Darling Test is perhaps the best of these tests to use... but I still prefer the Q-Q plot for small samples.
Throughout PROC INSIGHT are little boxes that give you additional options. The arrow box at the bottom of the histogram window contains a Ticks... option that lets you control what boxes and endpoints you use in the histogram. Try chaing the "Tick Increment". In the spread sheet, if you click on the black box at the end of any row, it gives the options to take any given point out of the calculations or graphs. If you do that the output window automatically updates. Similarly, if you click on any number,that point is highlighted in all the graphs.
Besides Distribution (Y) in the Analyze menu, we could also have chosen Fit(YX). Choose sex for X and sal77 for Y. If the X variable is numerical, this will conduct a linear regression. If it is a categorical (name) variable, it will conduct an Analysis of Variance. Now, click OK and we will see what the output looks like. We could repeat this again and add educ as another X variable.
We will see a number of other things in SAS throught the semester, and "instructions" will be posted here for most of the homework assignments.
One sample t-test and confidence interval: Say we were using the data in Table 1-1 (pg. 5) and wanted to test the null hypothesis that the mean weight was 10 vs. the alternate hypothesis that the mean weight was greater than 10. The first step is to enter the data:
DATA weight1p1; INPUT weight @@; CARDS; 7.0 8.0 8.6 9.4 10.2 10.5 11.0 11.6 12.3 12.4 13.6 14.2 ;
We could then start PROC INSIGHT like we did in class on the 17th. In order to get to the menus that will make the confidence interval and perform the t-test, you have to use Distribution (Y) under the Analyze Menu. Once you have done that, you can find Basic Confidence Intervals and Tests for Location... in the Tables Menu.
When asking it to perform the tests for location, you need to make sure and give it the value of the mean that you want to use for the null hypothesis. The output will have three separate tests: the t-test (that we want) and the signed rank and sign test (that you would learn about in STAT 518). It is also important to note that SAS always gives the p-value for testing "not equals".
In this case, we get t=1.14 and a p-value of 0.2774 for testing "not equals". If we were testing the alternate hypothesis "greater than" we would have to change the p-value to 0.1387.
Linear Regression: Linear regression can be formed using Fit(YX) under the Analyze menu in PROC INSIGHT just like the salary examples in class on the 17th... except that both Y and X are numerical. The residual versus predicted plot is formed automatically, and the residual Q-Q Plot can be made by choosing Residual Normal QQ under the Graphs menu.
NOTE: You do not need to check the assumptions on the regression question (#3).
Prediction Interval for Individual and Confidence Interval for Mean: The following code and instructions will let you reproduce the output on pages 36 to 39 of the text.
DATA martians; INPUT height_x weight_y; CARDS; 31 7.8 32 8.3 33 7.6 34 9.1 35 9.6 35 9.8 40 11.8 41 12.1 42 14.7 46 13.0 40 . ; PROC GLM DATA=martians; MODEL weight_y = height_x / ALPHA=0.05 CLI; RUN; PROC GLM DATA=martians; MODEL weight_y = height_x / ALPHA=0.05 CLM; RUN;
Notice that there are two places where we can find the intervals for the a height of 40. It was one of the data points (#7) and we also added that X value in again as the last observation to force it to give us that estimate. Notice it just has a period for the Y value so that it won't change any of the calculations. The intervals will be slightly different from the 11.0 to 12.5 reported on page 38 and the 9.5 to 14.0 reported on page 39 because of rounding.
It is also possible to get the graphs like those on page 36 by using PROC INSIGHT. Start up PROC INSIGHT and perform the regression as usual using Fit(YX). The options to add the curves to the scatter plot can be found by choosing the Confidence Curves option under the Curves menu.
The Data for Question 2: Here is the raw data for this problem, with the first line being the names of the variables. Remember in the INPUT line to put a $ after state since it contains the names of the states and not numerical values.
state sat takers income years public expend rank Iowa 1088 3 326 16.79 87.8 25.60 89.7 SouthDakota 1075 2 264 16.07 86.2 19.95 90.6 NorthDakota 1068 3 317 16.57 88.3 20.62 89.8 Kansas 1045 5 338 16.30 83.9 27.14 86.3 Nebraska 1045 5 293 17.25 83.6 21.05 88.5 Montana 1033 8 263 15.91 93.7 29.48 86.4 Minnesota 1028 7 343 17.41 78.3 24.84 83.4 Utah 1022 4 333 16.57 75.2 17.42 85.9 Wyoming 1017 5 328 16.01 97.0 25.96 87.5 Wisconsin 1011 10 304 16.85 77.3 27.69 84.2 Oklahoma 1001 5 358 15.95 74.2 20.07 85.6 Arkansas 999 4 295 15.49 86.4 15.71 89.2 Tennessee 999 9 330 15.72 61.2 14.58 83.4 NewMexico 997 8 316 15.92 79.5 22.19 83.7 Idaho 995 7 285 16.18 92.1 17.80 85.9 Mississippi 988 3 315 16.76 67.9 15.36 90.1 Kentucky 985 6 330 16.61 71.4 15.69 86.4 Colorado 983 16 333 16.83 88.3 26.56 81.8 Washington 982 19 309 16.23 87.5 26.53 83.2 Arizona 981 11 314 15.98 80.9 19.14 84.3 Illinois 977 14 347 15.80 74.6 24.41 78.7 Louisiana 975 5 394 16.85 44.8 19.72 82.9 Missouri 975 10 322 16.42 67.7 20.79 80.6 Michigan 973 10 335 16.50 80.7 24.61 81.8 WestVirginia 968 7 292 17.08 90.6 18.16 86.2 Alabama 964 6 313 16.37 69.6 13.84 83.9 Ohio 958 16 306 16.52 71.5 21.43 79.5 NewHampshire 925 56 248 16.35 78.1 20.33 73.6 Alaska 923 31 401 15.32 96.5 50.10 79.6 Nevada 917 18 288 14.73 89.1 21.79 81.1 Oregon 908 40 261 14.48 92.1 30.49 79.3 Vermont 904 54 225 16.50 84.2 20.17 75.8 California 899 36 293 15.52 83.0 25.94 77.5 Delaware 897 42 277 16.95 67.9 27.81 71.4 Connecticut 896 69 287 16.75 76.8 26.97 69.8 NewYork 896 59 236 16.86 80.4 33.58 70.5 Maine 890 46 208 16.05 85.7 20.55 74.6 Florida 889 39 255 15.91 80.5 22.62 74.6 Maryland 889 50 312 16.90 80.4 25.41 71.5 Virginia 888 52 295 16.08 88.8 22.23 72.4 Massachusetts 888 65 246 16.79 80.7 31.74 69.9 Pennsylvania 885 50 241 17.27 78.6 27.98 73.4 RhodeIsland 877 59 228 16.67 79.7 25.59 71.4 NewJersey 869 64 269 16.37 80.6 27.91 69.8 Texas 868 32 303 14.95 91.7 19.55 76.4 Indiana 860 48 258 14.39 90.2 17.93 74.1 Hawaii 857 47 277 16.40 67.6 21.21 69.9 NorthCarolina 827 47 224 15.31 92.8 19.92 75.3 Georgia 823 51 250 15.55 86.5 16.52 74.0 SouthCarolina 790 48 214 15.42 88.1 15.60 74.0
Multiple Regression: PROC INSIGHT performs multiple regression exactly the same way that it performs simple linear regression. Simply choose as many different X variables as you want all at once.
Residual Diagnositics: You can find the various residuals diagnostics in PROC INSIGHT. After you use Fit(YX) to perform the regression, go to the Vars menu and choose the statistic you want. It will be added as a column to the spreadsheet.
Variable Selection: To perform model selection, you can use the procedure PROC REG. The following code would perform the model selection for the continuous variables in the data set salary that we saw in the lab on January 17th.
PROC REG DATA=salary; MODEL sal77 = bsal senior age educ exper / SELECTION = RSQUARE ADJRSQ CP; RUN;
Data for Problem 2: The following gives the populations of the states in 1990 and 1999, and the ranks. (We don't need the ranks for this problem. The is from www.fedstats.gov, with the District of Columbia removed.)
State pop1990 rank1990 pop1999 rank1999 Alabama 4040 22 4370 23 Alaska 550 49 620 48 Arizona 3665 24 4778 20 Arkansas 2351 33 2551 33 California 29811 1 33145 1 Colorado 3294 26 4056 24 Connecticut 3287 27 3282 29 Delaware 666 46 754 45 Florida 12938 4 15111 4 Georgia 6478 11 7788 10 Hawaii 1108 41 1185 42 Idaho 1007 42 1252 40 Illinois 11431 6 12128 5 Indiana 5544 14 5943 14 Iowa 2777 30 2869 30 Kansas 2478 32 2654 32 Kentucky 3687 23 3961 25 Louisiana 4222 21 4372 22 Maine 1228 38 1253 39 Maryland 4781 19 5172 19 Massachusetts 6016 13 6175 13 Michigan 9295 8 9864 8 Minnesota 4376 20 4776 21 Mississippi 2575 31 2769 31 Missouri 5117 15 5468 17 Montana 799 44 883 44 Nebraska 1578 36 1666 38 Nevada 1202 39 1809 35 NewHampshire 1109 40 1201 41 NewJersey 7748 9 8143 9 NewMexico 1515 37 1740 37 NewYork 17991 2 18197 3 NorthCarolina 6632 10 7651 11 NorthDakota 639 47 634 47 Ohio 10847 7 11257 7 Oklahoma 3146 28 3358 27 Oregon 2842 29 3316 28 Pennsylvania 11883 5 11994 6 RhodeIsland 1003 43 991 43 SouthCarolina 3486 25 3886 26 SouthDakota 696 45 733 46 Tennessee 4877 17 5484 16 Texas 16986 3 20044 2 Utah 1723 35 2130 34 Vermont 563 48 594 49 Virginia 6189 12 6873 12 Washington 4867 18 5756 15 WestVirginia 1793 34 1807 36 Wisconsin 4892 16 5250 18 Wyoming 454 50 480 50The tricky part on this problem is how you transform the variables. While the spreadsheet is on top in PROC INSIGHT go to the Variables option under the Edit menu. If you just wanted to (for example) take the exponent of one of your variables you would choose exp(Y) and then simply select what variable Y was and hit ok. This then adds the new variable to the spreadsheet. You can do some more complicated transformations by choosing Other... under the Variables option menu.
Once the new variable has been added to the sheet (make sure you checked what it would be called before you hit OK!) you can then use it in either Distribution (Y) or Fit (YX) just like you would any other variable.
Data for Problem 4:
Hair_Color Pain_Thresh LightBlond 62 LightBlond 60 LightBlond 71 LightBlond 55 LightBlond 48 DarkBlond 63 DarkBlond 57 DarkBlond 52 DarkBlond 41 DarkBlond 43 LightBrunette 42 LightBrunette 50 LightBrunette 41 LightBrunette 37 DarkBrunette 32 DarkBrunette 39 DarkBrunette 51 DarkBrunette 30 DarkBrunette 35
ANOVA Example (with Levene's Test and Holm Procedure):
The following discusses the analysis of the
data in Table 7-4 on page 297. The data
consists of the cortisol levels in three groups of people: healthy
individuals, and two-levels of depressed individuals. This data could
be analyzed using PROC INSIGHT to get the ANOVA table and
residual plots. Using PROC GLM it is also possible to test
contrasts and perform Levene's test in addition to getting the
ANOVA table.
The code below uses PROC GLM to produce the ANOVA table 7-5 on page 298.
DATA tab7p4; INPUT group $ cort @@; CARDS; h 2.5 n 5.4 m 8.1 h 7.2 n 7.8 m 9.5 h 8.9 n 8.0 m 9.8 h 9.3 n 9.3 m 12.2 h 9.9 n 9.7 m 12.3 h 10.3 n 11.1 m 12.5 h 11.6 n 11.6 m 13.3 h 14.8 n 12.0 m 17.5 h 4.5 n 12.8 m 24.3 h 7.0 n 13.1 m 10.1 h 8.5 n 15.8 m 11.8 h 9.3 n 7.5 m 9.8 h 9.8 n 7.9 m 12.1 h 10.3 n 7.6 m 12.5 h 11.6 n 9.4 m 12.5 h 11.7 n 9.6 m 13.4 n 11.3 m 16.1 n 11.6 m 25.2 n 11.8 n 12.6 n 13.2 n 16.3 ; PROC GLM DATA=tab7p4 ORDER=DATA; CLASS group; MODEL cort=group; RUN;
To conduct a test of hypotheses that the variances of the three groups are equal, we could add an extra line after the MODEL statement above. The following would perform the Levene median test (see page 325).
MEANS group / HOVTEST=BF;The BF stands for Brown and Forsythe, the name that SAS uses for this particular test. For this example, the test of the null hypothesis that the variances for the three groups are equal is 0.6234. We would thus fail to reject the null hypothesis. (Remember that we need to use PROC INSIGHT to check the normality assumption still!)
Multiple Comparisons: (Continuing the above example...) To generate the output for the Holm test (as on page 315), we need to use PROC MULTTEST. The following code would perform the Holm test on all of the different pairs of groups. (SAS calls this the "Stepdown Bonferroni" method.) Notice that there are three columns of 1's, -1's and 0's... this corresponds to the fact that there are three varialbes. The first row being 1 -1 0 means we are comparing the first variable/healthy (1) to the second variable/nonm (-1) and ignoring the third/m (0).
PROC MULTTEST DATA=tab7p4 ORDER=DATA HOLM; CLASS group; CONTRAST 'healthy vs. nonm' 1 -1 0; CONTRAST 'healthy vs. m' 1 0 -1; CONTRAST 'nonm vs. m' 0 1 -1; TEST mean(cort); RUN;The p-values in the column labeled Stepdown Bonferroni have already been adjusted so that you simply need to compare them to the family-wise alpha-level. The logic here is that the smallest p-value was 0.008, so you could either compare 0.0008 to alpha/3 or compare 0.0008*3 to alpha. You could then either compare 0.0157 to alpha/2 or compare 0.0157*2 to alpha. Then finally, you would compare 0.2014 to alpha either way. For this example we could write up the output for alpha = 0.001, 0.02, 0.05 and 0.25 as follows.
Group Mean a=0.001 a=0.02 a=0.05 a=0.25 ---------------------------------------------------------------- m 13.500 A B B C A B nonm 10.700 A A B A B A A A healthy 9.200 A A A A ----------------------------------------------------------------For high alpha levels, we reject the null hypothesis more often (more likely to make type I errors) and are thus more likely to reject that the groups are the same. For a small alpha level it is hard to have enough evidence to reject that the groups are the same (more likely to make type II errors) and so we are likely to conclude they are all the same.
Contrasts: (Still continuing the above example...) The code below uses PROC GLM to produce the ANOVA table 7-5 on page 298. It also uses two contrasts (one comparing non-melancholic to healthy, and one comparing melancholic to healthy) to get the same results as the text does in figure 7-5 on page 301. The book used dummy variables however, and the code below uses contrasts; thus the variable Dn in the regression is the same as the contrast 'nonm minus healthy' and Dm is the same as the contrast 'm minus healthy'. The lines using ESTIMATE do the same thing as the lines with CONTRAST but they also return the estimated value (the same as the slopes in figure 7-5) and the t instead of F (recall F=t2).
PROC GLM DATA=tab7p4 ORDER=DATA; CLASS group; MODEL cort=group; CONTRAST 'nonm minus healthy' group -1 1 0; CONTRAST 'm minus healthy' group -1 0 1; ESTIMATE 'nonm minus healthy' group -1 1 0; ESTIMATE 'm minus healthy' group -1 0 1; RUN;SAS won't automatically construct the confidence intervals for the contrasts, but we could do it by hand from the output (or program SAS to do it). This is because the output from the ESTIMATE line gives both the estimate and the standard error for the estimate. (The degrees of freedom are the same as the degrees of freedom for the SSwit.) The 95% CI for the difference of the means for the non-melancholic and healthy groups would thus be: 1.5 +/- 1.96 (1.15947) = 1.5 +/- 2.335 = (-0.835, 3.835).
Data for Problem 3:
IQ ADOPTIVE BIOLOGIC 136.00 High High 99.00 High High 121.00 High High 133.00 High High 125.00 High High 131.00 High High 103.00 High High 115.00 High High 94.00 High Low 103.00 High Low 99.00 High Low 125.00 High Low 111.00 High Low 93.00 High Low 101.00 High Low 94.00 High Low 98.00 Low High 99.00 Low High 91.00 Low High 124.00 Low High 100.00 Low High 116.00 Low High 113.00 Low High 119.00 Low High 92.00 Low Low 91.00 Low Low 98.00 Low Low 83.00 Low Low 99.00 Low Low 68.00 Low Low 76.00 Low Low 115.00 Low Low
SAS for Two-Way ANOVAs: The following
discusses the analysis of the kidney
data on pages 364-373 by performing the standard two-way ANOVA
(including using Holm's test on the main effect and interactions).
DATA ksh; INPUT strain $ site $ activity @@; CARDS; norm dct 62 norm dct 73 norm dct 58 norm dct 66 hyp dct 44 hyp dct 49 hyp dct 46 hyp dct 37 norm ccd 15 norm ccd 31 norm ccd 19 norm ccd 35 hyp ccd 8 hyp ccd 36 hyp ccd 11 hyp ccd 18 norm omcd 7 norm omcd 7 norm omcd 9 norm omcd 17 hyp omcd 19 hyp omcd 7 hyp omcd 15 hyp omcd 4 ; PROC GLM DATA=ksh ORDER=DATA; CLASS strain site; MODEL activity = strain site strain*site; RUN;After running this portion, note that the ANOVA table in the output is identical to the one given on the top of page 369, even though that one was gotten using dummy variable coding. Also note that if you add up the various Type I SS on the page 369 output, you will get the same values as the Type III SS in the SAS output from the above code.
Now, to see if the main effects (strain and site) and the interaction (strain*site) are significant, we actually need to do three tests. Because of this we should probably use the Holm procedure to control the family-wise error rate. To do this we use PROC MULTTEST, but instead of giving it contrasts we give it the p-values from the Type III tests. The data set pvals is are the p-values from above, and the p-values must be called raw_p. It is important to note what order you entered the values p-values in!
DATA pvals; INPUT whichone $ raw_p; CARDS; strain 0.0156 site 0.0001 strain*site 0.0404 ; PROC MULTTEST PDATA=pvals HOLM; RUN;
In this case, all three adjusted p-values are still less than 0.05, and so we would still say that both the main effects and interaction were significant with a family-wise alpha of 0.05.
To check the assumptions for this analysis, we need to get the residual plots and use Levene's test. The residual plots can be gotten from PROC INSIGHT simply by using Fit(YX). Choose activity as the Y variable, and both strain and site as the X variables. This is not enough, however. We still need to put the interaction term in. Highlight both strain and site at the same time, and then hit Cross. This will add strain*site to the list of independent variables. Now, hit OK and you can proceed as usual.
Unfortunately SAS will only run Levene's test for a one-way ANOVA. Because of this, we need to trick SAS into thinking this is a one-way ANOVA by actually making it into one! What we will do is change it into a one-way ANOVA with six different treatments normdct, hypdct, normccd, hypccd, normomcd, and hypomcd. The following code will do this using the || command for concatonating the names of the factor levels together after trimming off the extra spaces they may have. The following code makes the new data set ksh2 from the old data set ksh, prints it out so that you can see what its done, and then run's the modified Levene's test. (Recall that SAS calls this test the Brown and Forsythe test.)
DATA ksh2; SET ksh; KEEP block activity; block = trim(strain)||trim(site); PROC PRINT data=ksh2; RUN; PROC GLM DATA=ksh2 ORDER=DATA; CLASS block; MODEL activity = block; MEANS block / HOVTEST=BF; RUN;
Below is the data for Homework Assignment 8. The variables are temperature in Fahrenheit and whether or not there was an O-ring failure.
53 1 56 1 57 1 63 0 66 0 67 0 67 0 67 0 68 0 69 0 70 0 70 1 70 1 70 1 72 0 73 0 75 0 75 1 76 0 76 0 78 0 79 0 80 0 81 0All of this problem (except the Hosmer-Lemeshow Statistic) can be conducted using PROC INSIGHT. In the FIT (YX) menu you have to choose several options after you select Y and X. Select METHOD using the button at the bottom. In the menu that pops up choose the response distribution Binomial, and the link function Logit.
PROC LOGISTIC will also calculate the Hosmer-Lemeshow Statistic. The following code would analyze the data in Table 12-1 on page 609, and it reproduces the output in figure 12-2 on page 611.
DATA grad; INPUT int success failure; total=success+failure; CARDS; 1 0 2 2 0 2 3 0 5 4 0 3 5 1 6 6 1 3 7 1 2 8 4 7 9 5 4 10 7 4 11 7 2 12 7 1 13 11 1 14 7 0 15 11 0 16 5 0 17 7 0 18 2 0 19 2 0 ; PROC LOGISTIC DATA=grad; MODEL success/total = int / LACKFIT; RUN;The data here was input in a slightly different format that the way the data for the homework assignment would be entered. For this data we had several observations recorded at the same intelligence level, all reported on a single line of input (e.g. there were two observations at z=1, 0 successes and 2 failures). Your input would have this on separate lines. For example, the data set grad would start like:
DATA grad; INPUT int grad @@; CARDS; 1 0 1 0 2 0 2 0 3 0 3 0 3 0 3 0 3 0 4 0 4 0 4 0 5 1 5 0 5 0 5 0 5 0 5 0 5 0 etc... ;In this case the code used to run PROC LOGISTIC would be:
PROC LOGISTIC DATA=grad DESCENDING; MODEL grad = int /LACKFIT; RUN;The log window will tell you that the DESCENDING means you are predicting the probability of getting a 1. Without it, you would be predicting the probability of getting a zero.
You can always ask me a question about SAS programming issues.
In most cases, help with the computers (NOT the programming) can be gained by e-mailing help@stat.sc.edu
For the printers on the first and second floor, printer paper is available in the Stat Department office. For printers on the third floor, paper is available in the Math Department office.
If you are using a PC restarting the machine will fix many problems, but obviously don't try that if you have a file that won't save or the like. (It is always good to save what you are doing periodically on your z-drive.)
If SAS won't start, one of the things to check is that your computer has loaded the X drive correctly (whatever that means). Go to My Computer and see if the apps on 'lc-nt' (X:) is listed as one of the drives. If it isn't, go to the Tools menu and select Map Network Drive.... Select X for the drive, and enter \\lc-nt\apps for the Folder. Then click Finish. This should connect your computer to the X drive and allow SAS to run. If you already had the X-drive connected, then you will need to e-mail help@stat.sc.edu.
If your graphs print out extremely small after you copy them to word, you might be able to fix the problem by "opening and closing" the image. In word, left click once on the image, and select Edit Picture or Open Picture Object under the Edit menu. A separate window will open with the image in it. Simply choose Close Picture. It should now print out ok. This will also make the spacing between the characters in the labels look right if they were somewhat off. You can also modify the picture so that you can move it around the screen. Right click on the image twice and choose Format Object... in the menu that pops up. Choose the Layout tab, select In front of text, and click OK. (This might move the image a bit when you first do it, but you can just click on it to move it back.
If the problem is an emergency requiring immediate attention
see Jason Dew in room 415.
If Jason
is not available and it is an emergency see Minna Moore in room
417.
Flagrantly non-emergency cases may result in
suspension of computer privileges.