The Program Editor is where you tell SAS what you want done. The Output window is where it puts the results, and the Log window is where it tells you what it did and if there are any errors. It is important to note that the Output window often gets very long! You usually want to copy the parts you want to print into MS-Word and print from there. It is also important to note that you should check the Log window everytime you run anything. (The error SAS Syntax Editor control is not installed is ok though.) The errors will appear in maroon. Successful runs appear in Blue.
Hitting the [F3] key will run the program currently in the Program Editor window.
This will however erase whatever was written in the Program Editor window. To recall whatever was there, make sure you are in that window, and hit the [F4] key.
If you happen to lose a window, check under the View menu at the top.
If you keep running more programs it will keep adding it all to the Output window. To clear the Output window, make sure you are in that window, and choose Clear All under the Edit menu.
The code below uses the Texas employment data we saw in class.
OPTIONS pagesize=60 linesize=80; DATA salary; INPUT BSAL SAL77 SEX $ SENIOR AGE EDUC EXPER @@; CARDS; 5100 8940 1 95 640 15 165 6300 10860 1 84 662 15 231 4800 8580 1 98 774 12 381 6000 9720 1 69 488 12 121 5280 8760 1 98 557 8 190 5100 9600 1 85 406 12 59 5280 8040 1 88 745 8 90 4800 11100 1 87 349 12 11 4800 9000 1 77 505 12 63 5100 10020 1 87 508 16 123 4800 8820 1 76 482 12 6 5700 9780 1 74 542 12 116.5 5400 13320 1 86 329 15 24 5400 10440 1 72 604 12 169 5520 9600 1 82 558 12 97 5100 10560 1 84 458 12 36 5400 8940 1 88 338 12 26 4800 9240 1 84 571 16 214 5700 9000 1 76 667 12 90 6000 11940 1 86 486 15 78.5 3900 8760 1 98 327 12 0 4380 10020 1 93 313 8 7.5 4800 9780 1 75 619 12 144 5580 7860 1 69 600 12 132.5 6120 9360 1 78 624 12 208.5 4620 9420 1 96 385 12 52 5220 7860 1 70 671 8 102 5220 8340 1 70 468 12 127 5100 9660 1 66 554 8 96 5040 12420 0 96 329 15 14 4380 9600 1 92 305 8 6.25 6300 12060 0 82 357 15 72 4290 9180 1 69 280 12 5 6000 15120 0 67 315 15 35.5 5400 9540 1 66 534 15 122 6000 16320 0 97 354 12 24 4380 10380 1 92 305 12 0 6000 12300 0 66 351 12 56 5400 8640 1 65 603 8 173 6840 10380 0 92 374 15 41.5 5400 11880 1 66 302 12 26 8100 13979 0 66 369 16 54.5 4500 12540 1 96 366 8 52 6000 10140 0 82 363 12 32 5400 8400 1 70 628 12 82 6000 12360 0 88 555 12 252 5520 8880 1 67 694 12 196 6900 10920 0 75 416 15 132 5640 10080 1 90 368 12 55 6900 10920 0 89 481 12 175 4800 9240 1 73 590 12 228 5400 12660 0 91 331 15 17.5 5400 8640 1 66 771 8 228 6000 12960 0 66 355 15 64 4500 7980 1 80 298 12 8 6000 12360 0 86 348 15 25 5400 11940 1 77 325 12 38 5400 10680 0 88 359 12 38 5400 9420 1 72 589 15 49 5400 11640 0 96 474 12 113 6300 9780 1 66 394 12 86.5 5100 7860 0 84 535 12 180 5160 10680 1 87 320 12 18 6600 11220 0 66 369 15 84 5100 11160 1 98 571 15 115 5100 8700 0 97 637 12 315 4800 8340 1 79 602 8 70 6600 12240 0 83 536 15 215.5 5400 9600 1 98 568 12 244 5700 11220 0 94 392 15 36 4020 9840 1 92 528 10 44 6000 12180 0 91 364 12 49 4980 8700 1 74 718 8 318 6000 11580 0 83 521 15 108 5280 9780 1 88 653 12 107 6000 8940 0 80 686 12 272 5700 8280 1 65 714 15 241 6000 10680 0 87 364 15 56 4800 8340 1 87 647 12 163 4620 11100 0 77 293 12 11.5 4800 13560 1 82 338 12 11 5220 10080 0 85 344 12 29 5700 10260 1 82 362 15 51 6600 15360 0 83 340 15 64 4380 9720 1 93 303 12 4.5 5400 12600 0 78 305 12 7 4380 10500 1 89 310 12 0 6000 8940 0 78 659 8 320 5700 10620 1 88 410 15 61 5400 9480 0 88 690 15 359 5400 10320 1 78 584 15 51 6000 14400 0 96 402 16 45.5 4440 9600 1 97 341 15 75 ;
Note that _most_ lines end with a semi-colon, but not all. SAS will crash if you miss one, but usually the log window will tell you where the problem is.
The OPTIONS line only needs to be used once during a session. It sets the length of the page and the length of the lines for viewing on the screen and printing. The font can be set by using the Options choice under the Tools menu along the top of the screen. When you cut and paste from SAS to a word processor, the font Courier New works well.
The DATA line defines what the name of the data set is. The name may not have any spaces; only letters, numbers, and underscores. It must start with a letter. In older versions of SAS the name must be eight characters long or less. The INPUT line gives the names of the variables, and they must be in the order that the data will be entered. The @@ at the end of the line says that there may be more than one observatin per line. If we had left it out then it would have skipped from the observation with a 5100 base salary right to the one with a 4800 base salary. If we had put a $ after sex says that that variable is a category/name and not a number.
If we hit F3 at this point to run what we put above, nothing new will appear on the output screen. This should be no big surprise, once we realize that we haven't told SAS to return any output! The code below simply tells SAS to print back out the data we entered.
PROC PRINT DATA=salary;
TITLE "The Salary Data";
RUN;
The most basic method for getting a summary of the data is to use PROC UNIVARIATE.
PROC UNIVARIATE DATA=salary PLOT FREQ ;
VAR sal77;
TITLE 'Summary The 1977 Salaries';
RUN;
The VAR line says which of the variables you want a summary of. Also note that the graphs here are pretty awful. We'll see in a few minutes how to make the graphics look better.
The code used to generate the PROC TTEST output on the hand out is:
PROC TTEST DATA=salary;
CLASS sex;
VAR sal77;
RUN;
One way to get a nicer q-q plot (normal probability plot) than the one that PROC UNIVARIATE makes (and to get a separate one for each sex) we could use PROC INSIGHT.
PROC INSIGHT;Another way to open PROC INSIGHT is to go to the Solutions menu, then to the Analysis menu, and then finally to the Ineteractive Data Analysis option. Once there you will need to go to the WORK library, and choose the salary data set.
OPEN salary;
RUN;
Once you start PROC INSIGHT a spreadsheet containing the data should appear on the screen. To analyze this data, go to the Analyze menu, and choose Distribution (Y). In the white box under SALARY select sal77 and click the Y button. Then select sex and click the Group button. Finally, click OK.
This causes an output window with two sets of graphs side-by-side to appear. You can cut and paste the graphs from these windows right into microsoft word. Simply click on the border of the box you want to copy with the left mouse button to select it. You can then cut and paste like normal. Later, to quit PROC INSIGHT, you will simply click on the X in the upper right portion of the spreadsheet.
The plot to check for normality is found under the Curves menu, choose QQ Ref Line... and then just click ok in the box that pops up. If the data is approximately normally distributed then it should fall near the line. With so few data points however, it is hard to tell if the data looks close or not. We can conduct an hypothesis test of normality by selecting Tests for Normality under the Tables menu. The Anderson-Darling Test is perhaps the best of these tests to use... but I still prefer the Q-Q plot for small samples.
Throughout PROC INSIGHT are little boxes that give you additional options. The arrow box at the bottom of the histogram window contains a Ticks... option that lets you control what boxes and endpoints you use in the histogram. Try chaing the "Tick Increment". In the spread sheet, if you click on the black box at the end of any row, it gives the options to take any given point out of the calculations or graphs. If you do that the output window automatically updates. Similarly, if you click on any number,that point is highlighted in all the graphs.
Besides Distribution (Y) in the Analyze menu, we could also have chosen Fit(YX). Choose sex for X and sal77 for Y. If the X variable is numerical, this will conduct a linear regression. If it is a categorical (name) variable, it will conduct an Analysis of Variance. Now, click OK and we will see what the output looks like. We could repeat this again and add educ as another X variable.
We will see a number of other things in SAS throught the semester, and "instructions" will be posted here for most of the homework assignments.
Two-sample t-test: The following code would analyze the data in Example 5.3 on page 193.
DATA mesquite; INPUT loc $ height @@; CARDS; A 1.70 A 2.00 M 1.30 M 0.90 M 1.50 A 3.00 A 1.30 M 1.35 M 1.35 M 1.50 A 1.70 A 1.45 M 2.16 M 1.40 M 1.20 A 1.60 A 2.20 M 1.80 M 1.00 M 0.70 A 1.40 A 0.70 M 1.55 M 1.70 M 1.20 A 1.90 A 1.90 M 1.20 M 1.50 M 0.80 A 1.10 A 1.80 M 1.00 M 0.65 A 1.60 A 2.00 M 1.70 M 1.50 A 2.00 A 2.20 M 0.80 M 1.70 A 1.25 A 0.92 M 1.20 M 1.70 ; PROC TTEST DATA=mesquite; CLASS loc; VAR height; RUN;
The assumption of equal variances can be checked by using the F-test on the PROC TTEST output (the p-value is 0.1364 in this case, so we can accept that the variances are equal if the two populations appear normal). PROC INSIGHT can be used to check the assumption of normality:
PROC INSIGHT; OPEN mesquite; RUN;
Choose Distribution (Y) under the Analyze menu, select height for Y and loc for Group, and hit ok. Then select QQ Ref Line... under the Curves menu. For this example both look very close to a straight line and so we would accept the assumption of normality. With much smaller data sets this can be hard to tell and there is nothing wrong with saying it is hard to tell because the data set is too small.
Linear Regression: The following code would analyze the data in Example 7.2 on page 293.
DATA housing; INPUT size price @@; CARDS; 0.951 30 1.532 93.5 2.336 129.9 1.036 39.9 1.647 94.9 1.98 132.9 0.676 46.5 1.344 95.8 2.483 134.9 1.456 48.6 1.550 98.5 2.809 135.9 1.186 51.5 1.752 99.5 2.036 139.5 1.456 56.99 1.450 99.9 2.298 139.99 1.368 59.9 1.312 102 2.038 144.9 0.994 62.5 1.636 106 2.370 147.6 1.176 65.5 1.5 108.9 2.921 149.99 1.216 69 1.8 109.9 2.262 152.55 1.410 76.9 1.972 110.0 2.456 156.9 1.344 79 1.387 112.29 2.436 164 1.064 79.9 2.082 114.9 1.920 167.5 1.770 79.95 . 119.5 2.949 169.9 1.524 82.9 2.463 119.9 3.310 175 1.750 84.9 2.572 119.9 2.805 179 1.152 85 2.113 122.9 2.553 179.9 1.770 87.9 2.016 123.938 2.510 189.5 1.624 89.9 1.852 124.9 3.627 199 1.540 89.9 2.670 126.9 ; PROC INSIGHT; OPEN housing; FIT price=size; RUN;
Prediction Interval for Individual and Confidence Interval for Mean: The following code and instructions will let you produce output like that shown on page 314 and 315 of the text. It uses the data set housing that can be found in the Homework 1 Notes above. (Note that the housing prices as entered above are in a different order than the one provided on the text books web-site.)
PROC GLM DATA=housing; MODEL price=size / ALPHA=0.05 CLM; RUN; PROC GLM DATA=housing; MODEL price=size / ALPHA=0.05 CLI; RUN;
The line with CLM produces the confidence interval for the mean predicted value (for the regression line). That is the one that is found on page 314-315. The line with CLI produces the prediction interval for a new observation. So, for example, the first observation is one that had size=0.951 and price=30. The predicted value is 58.7668. We can be 95% confident that the predicted value (the true regression line) would be between 49.4828 and 68.0507 for a house of size 0.951. We would expect 95% of all houses with area of 0.951 to have prices between 18.2563 and 99.2772. Say we wanted to estimate the price range for a house with an area of 0.800. We could do this by adding an extra observation to the data set, one with an area of 0.800 and only a period put in for the price (its missing, you don't have one.)
It is also possible to get the graphs like those on page 306 by using PROC INSIGHT. Start up PROC INSIGHT and perform the regression as usual using Fit(YX). The options to add the curves to the scatter plot can be found by choosing the Confidence Curves option under the Curves menu.
The Data for Question 2: Here is the raw data for this problem, with the first line being the names of the variables. Remember in the INPUT line to put a $ after state since it contains the names of the states and not numerical values.
state sat takers income years public expend rank Iowa 1088 3 326 16.79 87.8 25.60 89.7 SouthDakota 1075 2 264 16.07 86.2 19.95 90.6 NorthDakota 1068 3 317 16.57 88.3 20.62 89.8 Kansas 1045 5 338 16.30 83.9 27.14 86.3 Nebraska 1045 5 293 17.25 83.6 21.05 88.5 Montana 1033 8 263 15.91 93.7 29.48 86.4 Minnesota 1028 7 343 17.41 78.3 24.84 83.4 Utah 1022 4 333 16.57 75.2 17.42 85.9 Wyoming 1017 5 328 16.01 97.0 25.96 87.5 Wisconsin 1011 10 304 16.85 77.3 27.69 84.2 Oklahoma 1001 5 358 15.95 74.2 20.07 85.6 Arkansas 999 4 295 15.49 86.4 15.71 89.2 Tennessee 999 9 330 15.72 61.2 14.58 83.4 NewMexico 997 8 316 15.92 79.5 22.19 83.7 Idaho 995 7 285 16.18 92.1 17.80 85.9 Mississippi 988 3 315 16.76 67.9 15.36 90.1 Kentucky 985 6 330 16.61 71.4 15.69 86.4 Colorado 983 16 333 16.83 88.3 26.56 81.8 Washington 982 19 309 16.23 87.5 26.53 83.2 Arizona 981 11 314 15.98 80.9 19.14 84.3 Illinois 977 14 347 15.80 74.6 24.41 78.7 Louisiana 975 5 394 16.85 44.8 19.72 82.9 Missouri 975 10 322 16.42 67.7 20.79 80.6 Michigan 973 10 335 16.50 80.7 24.61 81.8 WestVirginia 968 7 292 17.08 90.6 18.16 86.2 Alabama 964 6 313 16.37 69.6 13.84 83.9 Ohio 958 16 306 16.52 71.5 21.43 79.5 NewHampshire 925 56 248 16.35 78.1 20.33 73.6 Alaska 923 31 401 15.32 96.5 50.10 79.6 Nevada 917 18 288 14.73 89.1 21.79 81.1 Oregon 908 40 261 14.48 92.1 30.49 79.3 Vermont 904 54 225 16.50 84.2 20.17 75.8 California 899 36 293 15.52 83.0 25.94 77.5 Delaware 897 42 277 16.95 67.9 27.81 71.4 Connecticut 896 69 287 16.75 76.8 26.97 69.8 NewYork 896 59 236 16.86 80.4 33.58 70.5 Maine 890 46 208 16.05 85.7 20.55 74.6 Florida 889 39 255 15.91 80.5 22.62 74.6 Maryland 889 50 312 16.90 80.4 25.41 71.5 Virginia 888 52 295 16.08 88.8 22.23 72.4 Massachusetts 888 65 246 16.79 80.7 31.74 69.9 Pennsylvania 885 50 241 17.27 78.6 27.98 73.4 RhodeIsland 877 59 228 16.67 79.7 25.59 71.4 NewJersey 869 64 269 16.37 80.6 27.91 69.8 Texas 868 32 303 14.95 91.7 19.55 76.4 Indiana 860 48 258 14.39 90.2 17.93 74.1 Hawaii 857 47 277 16.40 67.6 21.21 69.9 NorthCarolina 827 47 224 15.31 92.8 19.92 75.3 Georgia 823 51 250 15.55 86.5 16.52 74.0 SouthCarolina 790 48 214 15.42 88.1 15.60 74.0
Multiple Regression: PROC INSIGHT performs multiple regression exactly the same way that it performs simple linear regression. Simply choose as many different X variables as you want all at once.
Residual Diagnositics: You can find the various residuals diagnostics in PROC INSIGHT. After you use Fit(YX) to perform the regression, go to the Vars menu and choose the statistic you want. It will be added as a column to the spreadsheet.
Variable Selection: To perform model selection, you can use the procedure PROC REG. The following code would perform the model selection for the continuous variables in the data set salary that we saw in the lab on January 17th.
PROC REG DATA=salary; MODEL sal77 = bsal senior age educ exper / SELECTION = RSQUARE ADJRSQ CP; RUN;
The Data for Question 2: Here is the raw data for this problem, with the first line being the names of the variables. Remember in the INPUT line to put a $ after Area since it contains the names of the states and not numerical values.
Rank Area April_1_2000 April_1_1990 1 California 33871648 29760021 2 Texas 20851820 16986510 3 New_York 18976457 17990455 4 Florida 15982378 12937926 5 Illinois 12419293 11430602 6 Penn 12281054 11881643 7 Ohio 11353140 10847115 8 Michigan 9938444 9295297 9 NJersey 8414350 7730188 10 Georgia 8186453 6478216 11 NCarolina 8049313 6628637 12 Virginia 7078515 6187358 13 Mass 6349097 6016425 14 Indiana 6080485 5544159 15 Washington 5894121 4866692 16 Tennessee 5689283 4877185 17 Missouri 5595211 5117073 18 Wisconsin 5363675 4891769 19 Maryland 5296486 4781468 20 Arizona 5130632 3665228 21 Minnesota 4919479 4375099 22 Louisiana 4468976 4219973 23 Alabama 4447100 4040587 24 Colorado 4301261 3294394 25 Kentucky 4041769 3685296 26 SCarolina 4012012 3486703 27 Oklahoma 3450654 3145585 28 Oregon 3421399 2842321 29 Connecticut 3405565 3287116 30 Iowa 2926324 2776755 31 Mississippi 2844658 2573216 32 Kansas 2688418 2477574 33 Arkansas 2673400 2350725 34 Utah 2233169 1722850 35 Nevada 1998257 1201833 36 NMexico 1819046 1515069 37 WVirginia 1808344 1793477 38 Nebraska 1711263 1578385 39 Idaho 1293953 1006749 40 Maine 1274923 1227928 41 NHampshire 1235786 1109252 42 Hawaii 1211537 1108229 43 RIsland 1048319 1003464 44 Montana 902195 799065 45 Delaware 783600 666168 46 SDakota 754844 696004 47 NDakota 642200 638800 48 Alaska 626932 550043 49 Vermont 608827 562758 50 Wyoming 493782 453588
The tricky part on this problem is how you transform the variables. While the spreadsheet is on top in PROC INSIGHT go to the Variables option under the Edit menu. If you just wanted to (for example) take the exponent of one of your variables you would choose exp(Y) and then simply select what variable Y was and hit ok. This then adds the new variable to the spreadsheet. You can do some more complicated transformations by choosing Other... under the Variables option menu.
Once the new variable has been added to the sheet (make sure you checked what it would be called before you hit OK!) you can then use it in either Distribution (Y) or Fit (YX) just like you would any other variable.
All of the code needed for performing a one-way ANOVA is contained in the supplement to 6.5. Here is the code from the supplement that you can cut and past in if you want.
Entering the data: Here is the data from Table 6.21 on page 264.
DATA shrimp_weights; INPUT diet $ weight @@; CARDS; cafo_1 47.0 cafo_1 50.9 cafo_1 45.2 cafo_1 48.9 cafo_1 48.2 calo_2 38.1 calo_2 39.6 calo_2 39.1 calo_2 33.1 calo_2 40.3 faso_3 57.4 faso_3 55.1 faso_3 54.2 faso_3 56.8 faso_3 52.5 falo_4 54.2 falo_4 57.7 falo_4 57.1 falo_4 47.9 falo_4 53.4 bc_5 38.5 bc_5 42.0 bc_5 38.7 bc_5 38.9 bc_5 44.6 lma_6 48.9 lma_6 47.0 lma_6 47.0 lma_6 44.4 lma_6 46.9 lmaa_7 87.8 lmaa_7 81.7 lmaa_7 73.3 lmaa_7 82.7 lmaa_7 74.8 ;Checking the assumptions and getting the ANOVA table: PROC INSIGHT can be used to get the two residual plots, while PROC GLM will conduct the modified Levene test. Both will return the ANOVA table.
PROC INSIGHT; OPEN shrimp_weights; FIT weight=diet; RUN; PROC GLM DATA=shrimp_weights ORDER=DATA; CLASS diet; MODEL weight=diet; MEANS diet / HOVTEST=BF; RUN;Estimating contrasts: The following code will produce the output in table 6.33. Notice that it is NOT adjusted for the Holm procedure BUT it does return both the estimate (L-hat) and the estimated standard deviation of (L-hat) that could be used to make a confidence interval. It is important to note that if you make more than one confidence interval though you will have multiple comparison problems and want to Scheffe (or Tukey if you are doing only pairwise comparisons).
PROC GLM DATA=shrimp_weights ORDER=DATA; CLASS diet; MODEL weight=diet; ESTIMATE 'newold' diet 3 3 3 3 -4 -4 -4 / divisor=12; ESTIMATE 'corn' diet 5 5 -2 -2 -2 -2 -2 / divisor=10; ESTIMATE 'fish' diet 4 -3 4 4 -3 -3 -3 / divisor=12; ESTIMATE 'lin' diet -2 5 -2 5 -2 -2 -2 / divisor=10; ESTIMATE 'sun' diet -1 -1 6 -1 -1 -1 -1 / divisor=6; ESTIMATE 'mic' diet -2 -2 -2 -2 -2 5 5 / divisor=10; ESTIMATE 'art' diet -1 -1 -1 -1 -1 -1 6 / divisor=6; RUN;All pairwise comparisons: PROC MULTTEST can be used for all pairwise comparisons... and it automatically adjusts the p-values. The format is very similar to that of PROC GLM but it is not exactly the same.
PROC MULTTEST DATA=shrimp_weights ORDER=DATA HOLM; CLASS diet; CONTRAST '1 vs. 2' 1 -1 0 0 0 0 0; CONTRAST '1 vs. 3' 1 0 -1 0 0 0 0; CONTRAST '1 vs. 4' 1 0 0 -1 0 0 0; CONTRAST '1 vs. 5' 1 0 0 0 -1 0 0; CONTRAST '1 vs. 6' 1 0 0 0 0 -1 0; CONTRAST '1 vs. 7' 1 0 0 0 0 0 -1; CONTRAST '2 vs. 3' 0 1 -1 0 0 0 0; CONTRAST '2 vs. 4' 0 1 0 -1 0 0 0; CONTRAST '2 vs. 5' 0 1 0 0 -1 0 0; CONTRAST '2 vs. 6' 0 1 0 0 0 -1 0; CONTRAST '2 vs. 7' 0 1 0 0 0 0 -1; CONTRAST '3 vs. 4' 0 0 1 -1 0 0 0; CONTRAST '3 vs. 5' 0 0 1 0 -1 0 0; CONTRAST '3 vs. 6' 0 0 1 0 0 -1 0; CONTRAST '3 vs. 7' 0 0 1 0 0 0 -1; CONTRAST '4 vs. 5' 0 0 0 1 -1 0 0; CONTRAST '4 vs. 6' 0 0 0 1 0 -1 0; CONTRAST '4 vs. 7' 0 0 0 1 0 0 -1; CONTRAST '5 vs. 6' 0 0 0 0 1 -1 0; CONTRAST '5 vs. 7' 0 0 0 0 1 0 -1; CONTRAST '6 vs. 7' 0 0 0 0 0 1 -1; TEST mean(weight); RUN;
The Data for Question 2:
Angry 2.10 Angry 0.64 Angry 0.47 Angry 0.37 Angry 1.62 Angry -0.08 Disg 0.40 Disg 0.73 Disg -0.07 Disg -0.25 Disg 0.89 Disg 1.93 Fear 0.82 Fear -2.93 Fear -0.74 Fear 0.79 Fear -0.77 Fear -1.60 Happy 1.71 Happy -0.04 Happy 1.04 Happy 1.44 Happy 1.37 Happy 0.59 Sad 0.74 Sad -1.26 Sad -2.27 Sad -0.39 Sad -2.65 Sad -0.44 Neut 1.69 Neut -0.60 Neut -0.55 Neut 0.27 Neut -0.57 Neut -2.16
The Data for Question 3:
IQ ADOPTIVE BIOLOGIC 136.00 High High 99.00 High High 121.00 High High 133.00 High High 125.00 High High 131.00 High High 103.00 High High 115.00 High High 94.00 High Low 103.00 High Low 99.00 High Low 125.00 High Low 111.00 High Low 93.00 High Low 101.00 High Low 94.00 High Low 98.00 Low High 99.00 Low High 91.00 Low High 124.00 Low High 100.00 Low High 116.00 Low High 113.00 Low High 119.00 Low High 92.00 Low Low 91.00 Low Low 98.00 Low Low 83.00 Low Low 99.00 Low Low 68.00 Low Low 76.00 Low Low 115.00 Low Low
Code for the Kidney Example from In Class:
DATA ksh; INPUT strain $ site $ activity @@; CARDS; norm dct 62 norm dct 73 norm dct 58 norm dct 66 hyp dct 44 hyp dct 49 hyp dct 46 hyp dct 37 norm ccd 15 norm ccd 31 norm ccd 19 norm ccd 35 hyp ccd 8 hyp ccd 36 hyp ccd 11 hyp ccd 18 norm omcd 7 norm omcd 7 norm omcd 9 norm omcd 17 hyp omcd 19 hyp omcd 7 hyp omcd 15 hyp omcd 4 ; PROC INSIGHT; OPEN ksh; FIT activity=strain site strain*site; RUN; DATA ksh2; SET ksh; KEEP block activity; block = trim(strain)||trim(site); PROC PRINT data=ksh2; RUN; PROC GLM DATA=ksh2 ORDER=DATA; CLASS block; MODEL activity = block; MEANS block / HOVTEST=BF; RUN; DATA pvals; INPUT whichone $ raw_p; CARDS; strain 0.0156 site 0.0001 strain*site 0.0404 ; PROC MULTTEST PDATA=pvals HOLM; RUN;
The following code would analyze the data in Example 10.3 on pages 473-476 (using the data downloaded from the web-page. You want the link for FW10x03, even though the page that will come up calls the data set fw10x02).
DATA FW10x03; INPUT OBS LAB MATERIAL $ STRESS; CARDS; < input data set here > ; PROC GLM DATA=FW10x03 ORDER=DATA; CLASS LAB MATERIAL; MODEL STRESS = LAB MATERIAL LAB*MATERIAL; RANDOM LAB LAB*MATERIAL / TEST; MEANS MATERIAL / DUNCAN E=LAB*MATERIAL; OUTPUT OUT=fwresids P=pred R=resid; RUN;Note that the E=LAB*MATERIAL on the MEANS line tells it that you want to use a different denominator for conducting the test (see for example the end of the output on page 475). You can tell it should be E=LAB*MATERIAL by checking the EMS for testing MATERIAL.
The DUNCAN command is something we didn't cover in section 6.5. It makes a display showing which values are different from each other, much as we did with the Holm procedure (see the display on page 476). However, it doesn't keep the family-wise error rate under control -and- it doesn't have as much power as just ignoring the family-wise error rate. I can't really recommend using it.
The analysis of latin squares works just like the analysis of a factorial design (using PROC GLM), just remember that you cannot include the interaction terms.
Logistic Regression The following code will analyze the data in Table 11.8 on page 532 using logistic regression (it is probably a better analysis than the one the book has, so don't pay attention to what the output on page 533 looks like). This analysis is shown on page 541.
DATA fw11x04; INPUT y $ income; CARDS; 0 9.2 0 12.9 0 9.2 1 9.6 0 9.3 1 10.1 0 9.4 1 10.3 0 9.5 1 10.9 0 9.5 1 10.9 0 9.5 1 11.1 0 9.6 1 11.1 0 9.7 1 11.1 0 9.7 1 11.5 0 9.8 1 11.8 0 9.8 1 11.9 0 9.9 1 12.1 0 10.5 1 12.2 0 10.5 1 12.5 0 10.9 1 12.6 0 11 1 12.6 0 11.2 1 12.6 0 11.2 1 12.9 0 11.5 1 12.9 0 11.7 1 12.9 0 11.8 1 12.9 0 12.1 1 13.1 0 12.3 1 13.2 0 12.5 1 13.5 ; PROC LOGISTIC DATA=fw11x04 DESCENDING; MODEL y=income /LACKFIT; RUN;The important part of the output is:
Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -11.3472 3.3511 11.4660 0.0007 income 1 1.0018 0.2954 11.5013 0.0007 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 15.5687 1 < .0001 Score 14.1810 1 0.0002 Wald 11.5013 1 0.0007 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq 6.7043 7 0.4603Looking at the equation on page 536, -11.3472 is the estimate of beta0 and 1.0018 is the estimate of beta1. The likelihood ratio test p-value of < 0.0001 tests the null hypothesis that beta1 is zero, and the p-value of 0.4603 from the Hosmer and Lemeshow test tests the null hypothesis that the logistic form is appropriate.
Data Sets
The data sets from the book can be found at the
Statistical Methods
Companion Website.
In most cases, help with the computers (NOT the programming) can be gained by e-mailing help@stat.sc.edu
For the printers on the first and second floor, printer paper is available in the Stat Department office. For printers on the third floor, paper is available in the Math Department office.
If you are using a PC restarting the machine will fix many problems, but obviously don't try that if you have a file that won't save or the like.
If SAS won't start, one of the things to check is that your computer has loaded the X drive correctly (whatever that means). Go to My Computer and see if the apps on 'lc-nt' (X:) is listed as one of the drives. If it isn't, go to the Tools menu and select Map Network Drive.... Select X for the drive, and enter \\lc-nt\apps for the Folder. Then click Finish. This should connect your computer to the X drive and allow SAS to run. If you already had the X-drive connected, then you will need to e-mail help@stat.sc.edu.
If your graphs print out extremely small after you copy them to word, you might be able to fix the problem by "opening and closing" the image. In word, left click once on the image, and select Edit Picture or Open Picture Object under the Edit menu. A separate window will open with the image in it. Simply choose Close Picture. It should now print out ok. This will also make the spacing between the characters in the labels look right if they were somewhat off.
If the problem is an emergency requiring immediate attention
see Jason Dew in room 415.
If Jason
is not available and it is an emergency see Minna Moore in room
417.
Flagrantly non-emergency cases may result in
suspension of computer privileges.