516 SAS TEMPLATES

Stat 516 - Spring 2002 - SAS Templates

The Basics of SAS (in class 1/17/02)
Homework 1 Notes
Homework 2 Notes
Homework 3 Notes
Homework 4 Notes
Homework 5 Notes
Homework 6 Notes
Homework 7 Notes
Computer Trouble?
SAS Won't Start?
Graphs Printing Small in Word?

The Basics of SAS:

When you start SAS there are three windows that are used. The Log window, the Program Editor window, and the Output window. If you happen to lose one of these windows they usually have a bar at the bottom of the SAS window. You can also find them under the View menu.

The Program Editor is where you tell SAS what you want done. The Output window is where it puts the results, and the Log window is where it tells you what it did and if there are any errors. It is important to note that the Output window often gets very long! You usually want to copy the parts you want to print into MS-Word and print from there. It is also important to note that you should check the Log window everytime you run anything. (The error SAS Syntax Editor control is not installed is ok though.) The errors will appear in maroon. Successful runs appear in Blue.

Hitting the [F3] key will run the program currently in the Program Editor window.

This will however erase whatever was written in the Program Editor window. To recall whatever was there, make sure you are in that window, and hit the [F4] key.

If you happen to lose a window, check under the View menu at the top.

If you keep running more programs it will keep adding it all to the Output window. To clear the Output window, make sure you are in that window, and choose Clear All under the Edit menu.

The code below uses the Texas employment data we saw in class on Tuesday.

OPTIONS pagesize=60 linesize=80;

DATA salary;
INPUT BSAL SAL77 SEX $ SENIOR AGE EDUC EXPER @@;
CARDS;
5100	8940	1	95	640	15	165	6300	10860	1	84	662	15	231
4800	8580	1	98	774	12	381	6000	9720	1	69	488	12	121
5280	8760	1	98	557	8	190	5100	9600	1	85	406	12	59
5280	8040	1	88	745	8	90	4800	11100	1	87	349	12	11
4800	9000	1	77	505	12	63	5100	10020	1	87	508	16	123
4800	8820	1	76	482	12	6	5700	9780	1	74	542	12	116.5
5400	13320	1	86	329	15	24	5400	10440	1	72	604	12	169
5520	9600	1	82	558	12	97	5100	10560	1	84	458	12	36
5400	8940	1	88	338	12	26	4800	9240	1	84	571	16	214
5700	9000	1	76	667	12	90	6000	11940	1	86	486	15	78.5
3900	8760	1	98	327	12	0	4380	10020	1	93	313	8	7.5
4800	9780	1	75	619	12	144	5580	7860	1	69	600	12	132.5
6120	9360	1	78	624	12	208.5	4620	9420	1	96	385	12	52
5220	7860	1	70	671	8	102	5220	8340	1	70	468	12	127
5100	9660	1	66	554	8	96	5040	12420	0	96	329	15	14
4380	9600	1	92	305	8	6.25	6300	12060	0	82	357	15	72
4290	9180	1	69	280	12	5	6000	15120	0	67	315	15	35.5
5400	9540	1	66	534	15	122	6000	16320	0	97	354	12	24
4380	10380	1	92	305	12	0	6000	12300	0	66	351	12	56
5400	8640	1	65	603	8	173	6840	10380	0	92	374	15	41.5
5400	11880	1	66	302	12	26	8100	13979	0	66	369	16	54.5
4500	12540	1	96	366	8	52	6000	10140	0	82	363	12	32
5400	8400	1	70	628	12	82	6000	12360	0	88	555	12	252
5520	8880	1	67	694	12	196	6900	10920	0	75	416	15	132
5640	10080	1	90	368	12	55	6900	10920	0	89	481	12	175
4800	9240	1	73	590	12	228	5400	12660	0	91	331	15	17.5
5400	8640	1	66	771	8	228	6000	12960	0	66	355	15	64
4500	7980	1	80	298	12	8	6000	12360	0	86	348	15	25
5400	11940	1	77	325	12	38	5400	10680	0	88	359	12	38
5400	9420	1	72	589	15	49	5400	11640	0	96	474	12	113
6300	9780	1	66	394	12	86.5	5100	7860	0	84	535	12	180
5160	10680	1	87	320	12	18	6600	11220	0	66	369	15	84
5100	11160	1	98	571	15	115	5100	8700	0	97	637	12	315
4800	8340	1	79	602	8	70	6600	12240	0	83	536	15	215.5
5400	9600	1	98	568	12	244	5700	11220	0	94	392	15	36
4020	9840	1	92	528	10	44	6000	12180	0	91	364	12	49
4980	8700	1	74	718	8	318	6000	11580	0	83	521	15	108
5280	9780	1	88	653	12	107	6000	8940	0	80	686	12	272
5700	8280	1	65	714	15	241	6000	10680	0	87	364	15	56
4800	8340	1	87	647	12	163	4620	11100	0	77	293	12	11.5
4800	13560	1	82	338	12	11	5220	10080	0	85	344	12	29
5700	10260	1	82	362	15	51	6600	15360	0	83	340	15	64
4380	9720	1	93	303	12	4.5	5400	12600	0	78	305	12	7
4380	10500	1	89	310	12	0	6000	8940	0	78	659	8	320
5700	10620	1	88	410	15	61	5400	9480	0	88	690	15	359
5400	10320	1	78	584	15	51	6000	14400	0	96	402	16	45.5
4440	9600	1	97	341	15	75							
;

Note that _most_ lines end with a semi-colon, but not all. SAS will crash if you miss one, but usually the log window will tell you where the problem is.

The OPTIONS line only needs to be used once during a session. It sets the length of the page and the length of the lines for viewing on the screen and printing. The font can be set by using the Options choice under the Tools menu along the top of the screen. When you cut and paste from SAS to a word processor, the font Courier New works well.

The DATA line defines what the name of the data set is. The name may not have any spaces; only letters, numbers, and underscores. It must start with a letter. In older versions of SAS the name must be eight characters long or less. The INPUT line gives the names of the variables, and they must be in the order that the data will be entered. The @@ at the end of the line says that there may be more than one observatin per line. If we had left it out then it would have skipped from the observation with a 5100 base salary right to the one with a 4800 base salary. If we had put a $ after sex says that that variable is a category/name and not a number.

If we hit F3 at this point to run what we put above, nothing new will appear on the output screen. This should be no big surprise, once we realize that we haven't told SAS to return any output! The code below simply tells SAS to print back out the data we entered.


PROC PRINT DATA=salary;

TITLE "The Salary Data";

RUN;

The most basic method for getting a summary of the data is to use PROC UNIVARIATE.

PROC UNIVARIATE DATA=salary PLOT FREQ ;

VAR sal77;

TITLE 'Summary The 1977 Salaries';

RUN;

The VAR line says which of the variables you want a summary of. Also note that the graphs here are pretty awful. We'll see in a few minutes how to make the graphics look better.

The code used to generate the PROC TTEST output on the hand out is:

PROC TTEST DATA=salary;

CLASS sex;

VAR sal77;

RUN;

One way to get a nicer q-q plot (normal probability plot) than the one that PROC UNIVARIATE makes (and to get a separate one for each sex) we could use PROC INSIGHT.

PROC INSIGHT;

OPEN salary;

RUN;

Another way to open PROC INSIGHT is to go to the Solutions menu, then to the Analysis menu, and then finally to the Ineteractive Data Analysis option. Once there you will need to go to the WORK library, and choose the salary data set.

Once you start PROC INSIGHT a spreadsheet containing the data should appear on the screen. To analyze this data, go to the Analyze menu, and choose Distribution (Y). In the white box under SALARY select sal77 and click the Y button. Then select sex and click the Group button. Finally, click OK.

This causes an output window with two sets of graphs side-by-side to appear. You can cut and paste the graphs from these windows right into microsoft word. Simply click on the border of the box you want to copy with the left mouse button to select it. You can then cut and paste like normal. Later, to quit PROC INSIGHT, you will simply click on the X in the upper right portion of the spreadsheet.

The plot to check for normality is found under the Curves menu, choose QQ Ref Line... and then just click ok in the box that pops up. If the data is approximately normally distributed then it should fall near the line. With so few data points however, it is hard to tell if the data looks close or not. We can conduct an hypothesis test of normality by selecting Tests for Normality under the Tables menu. The Anderson-Darling Test is perhaps the best of these tests to use... but I still prefer the Q-Q plot for small samples.

Throughout PROC INSIGHT are little boxes that give you additional options. The arrow box at the bottom of the histogram window contains a Ticks... option that lets you control what boxes and endpoints you use in the histogram. Try chaing the "Tick Increment". In the spread sheet, if you click on the black box at the end of any row, it gives the options to take any given point out of the calculations or graphs. If you do that the output window automatically updates. Similarly, if you click on any number,that point is highlighted in all the graphs.

Besides Distribution (Y) in the Analyze menu, we could also have chosen Fit(YX). Choose sex for X and sal77 for Y. If the X variable is numerical, this will conduct a linear regression. If it is a categorical (name) variable, it will conduct an Analysis of Variance. Now, click OK and we will see what the output looks like. We could repeat this again and add educ as another X variable.

We will see a number of other things in SAS throught the semester, and "instructions" will be posted here for most of the homework assignments.

Homework One Notes:

One sample t-test and confidence interval: Say we were using the data in Table 1-1 (pg. 5) and wanted to test the null hypothesis that the mean weight was 10 vs. the alternate hypothesis that the mean weight was greater than 10. The first step is to enter the data:

DATA weight1p1;
INPUT weight @@;
CARDS;
7.0	8.0	8.6	9.4	10.2	10.5	11.0
11.6	12.3	12.4	13.6	14.2
;

We could then start PROC INSIGHT like we did in class on the 17th. In order to get to the menus that will make the confidence interval and perform the t-test, you have to use Distribution (Y) under the Analyze Menu. Once you have done that, you can find Basic Confidence Intervals and Tests for Location... in the Tables Menu.

When asking it to perform the tests for location, you need to make sure and give it the value of the mean that you want to use for the null hypothesis. The output will have three separate tests: the t-test (that we want) and the signed rank and sign test (that you would learn about in STAT 518). It is also important to note that SAS always gives the p-value for testing "not equals".

In this case, we get t=1.14 and a p-value of 0.2774 for testing "not equals". If we were testing the alternate hypothesis "greater than" we would have to change the p-value to 0.1387.

Linear Regression: Linear regression can be formed using Fit(YX) under the Analyze menu in PROC INSIGHT just like the salary examples in class on the 17th... except that both Y and X are numerical. The residual versus predicted plot is formed automatically, and the residual Q-Q Plot can be made by choosing Residual Normal QQ under the Graphs menu.

NOTE: You do not need to check the assumptions on the regression question (#3).

Homework Two Notes:

Prediction Interval for Individual and Confidence Interval for Mean: The following code and instructions will let you reproduce the output on pages 36 to 39 of the text.

 
DATA martians;
INPUT height_x weight_y;
CARDS;
31	7.8
32	8.3
33	7.6
34	9.1
35	9.6
35	9.8
40	11.8
41	12.1
42	14.7
46	13.0
40	.
;


PROC GLM DATA=martians;
MODEL weight_y = height_x / ALPHA=0.05 CLI;
RUN;

PROC GLM DATA=martians;
MODEL weight_y = height_x / ALPHA=0.05 CLM;
RUN;

Notice that there are two places where we can find the intervals for the a height of 40. It was one of the data points (#7) and we also added that X value in again as the last observation to force it to give us that estimate. Notice it just has a period for the Y value so that it won't change any of the calculations. The intervals will be slightly different from the 11.0 to 12.5 reported on page 38 and the 9.5 to 14.0 reported on page 39 because of rounding.

It is also possible to get the graphs like those on page 36 by using PROC INSIGHT. Start up PROC INSIGHT and perform the regression as usual using Fit(YX). The options to add the curves to the scatter plot can be found by choosing the Confidence Curves option under the Curves menu.

The Data for Question 2: Here is the raw data for this problem, with the first line being the names of the variables. Remember in the INPUT line to put a $ after state since it contains the names of the states and not numerical values.

state 		sat 	takers 	income 	years 	public 	expend 	rank
Iowa		1088	3	326	16.79	87.8	25.60	89.7
SouthDakota	1075	2	264	16.07	86.2	19.95	90.6
NorthDakota	1068	3	317	16.57	88.3	20.62	89.8
Kansas		1045	5	338	16.30	83.9	27.14	86.3
Nebraska	1045	5	293	17.25	83.6	21.05	88.5
Montana		1033	8	263	15.91	93.7	29.48	86.4
Minnesota	1028	7	343	17.41	78.3	24.84	83.4
Utah		1022	4	333	16.57	75.2	17.42	85.9
Wyoming		1017	5	328	16.01	97.0	25.96	87.5
Wisconsin	1011  	10	304	16.85	77.3	27.69	84.2
Oklahoma	1001	5	358	15.95	74.2	20.07	85.6
Arkansas	999	4	295	15.49	86.4	15.71	89.2
Tennessee	999	9	330	15.72	61.2	14.58	83.4
NewMexico	997	8	316	15.92	79.5	22.19	83.7
Idaho		995	7	285	16.18	92.1	17.80	85.9
Mississippi	988	3	315	16.76	67.9	15.36	90.1
Kentucky	985	6	330	16.61	71.4	15.69	86.4
Colorado	983  	16	333	16.83	88.3	26.56	81.8
Washington	982  	19	309	16.23	87.5	26.53	83.2
Arizona		981  	11	314	15.98	80.9	19.14	84.3
Illinois	977  	14	347	15.80	74.6	24.41	78.7
Louisiana	975   	5	394	16.85	44.8	19.72	82.9
Missouri	975  	10	322	16.42	67.7	20.79	80.6
Michigan	973  	10	335	16.50	80.7	24.61	81.8
WestVirginia	968   	7	292	17.08	90.6	18.16	86.2
Alabama		964   	6	313	16.37	69.6	13.84	83.9
Ohio		958  	16	306	16.52	71.5	21.43	79.5
NewHampshire	925  	56	248	16.35	78.1	20.33	73.6
Alaska		923  	31	401	15.32	96.5	50.10	79.6
Nevada		917  	18	288	14.73	89.1	21.79	81.1
Oregon		908  	40	261	14.48	92.1	30.49	79.3
Vermont		904  	54	225	16.50	84.2	20.17	75.8
California	899  	36	293	15.52	83.0	25.94	77.5
Delaware	897  	42	277	16.95	67.9	27.81	71.4
Connecticut	896  	69	287	16.75	76.8	26.97	69.8
NewYork		896  	59	236	16.86	80.4	33.58	70.5
Maine		890  	46	208	16.05	85.7	20.55	74.6
Florida		889  	39	255	15.91	80.5	22.62	74.6
Maryland	889  	50	312	16.90	80.4	25.41	71.5
Virginia	888  	52	295	16.08	88.8	22.23	72.4
Massachusetts	888  	65	246	16.79	80.7	31.74	69.9
Pennsylvania	885  	50	241	17.27	78.6	27.98	73.4
RhodeIsland	877  	59	228	16.67	79.7	25.59	71.4
NewJersey	869  	64	269	16.37	80.6	27.91	69.8
Texas		868  	32	303	14.95	91.7	19.55	76.4
Indiana		860  	48	258	14.39	90.2	17.93	74.1
Hawaii		857  	47	277	16.40	67.6	21.21	69.9
NorthCarolina	827  	47	224	15.31	92.8	19.92	75.3
Georgia		823  	51	250	15.55	86.5	16.52	74.0
SouthCarolina	790  	48	214	15.42	88.1	15.60	74.0

Multiple Regression: PROC INSIGHT performs multiple regression exactly the same way that it performs simple linear regression. Simply choose as many different X variables as you want all at once.

Homework Three Notes:

Residual Diagnositics: You can find the various residuals diagnostics in PROC INSIGHT. After you use Fit(YX) to perform the regression, go to the Vars menu and choose the statistic you want. It will be added as a column to the spreadsheet.

Variable Selection: To perform model selection, you can use the procedure PROC REG. The following code would perform the model selection for the continuous variables in the data set salary that we saw in the lab on January 17th.

PROC REG DATA=salary;
MODEL   sal77 = bsal senior age educ exper /
        SELECTION = RSQUARE ADJRSQ CP;
RUN;

Data for Problem 2: The following gives the populations of the states in 1990 and 1999, and the ranks. (We don't need the ranks for this problem. The is from www.fedstats.gov, with the District of Columbia removed.)

State			      pop1990 rank1990  pop1999	 rank1999	      
Alabama                          4040       22     4370        23
Alaska                            550       49      620        48
Arizona                          3665       24     4778        20
Arkansas                         2351       33     2551        33
California                      29811        1    33145         1
Colorado                         3294       26     4056        24
Connecticut                      3287       27     3282        29
Delaware                          666       46      754        45
Florida                         12938        4    15111         4
Georgia                          6478       11     7788        10
Hawaii                           1108       41     1185        42
Idaho                            1007       42     1252        40
Illinois                        11431        6    12128         5
Indiana                          5544       14     5943        14
Iowa                             2777       30     2869        30
Kansas                           2478       32     2654        32
Kentucky                         3687       23     3961        25
Louisiana                        4222       21     4372        22
Maine                            1228       38     1253        39
Maryland                         4781       19     5172        19
Massachusetts                    6016       13     6175        13
Michigan                         9295        8     9864         8
Minnesota                        4376       20     4776        21
Mississippi                      2575       31     2769        31
Missouri                         5117       15     5468        17
Montana                           799       44      883        44
Nebraska                         1578       36     1666        38
Nevada                           1202       39     1809        35
NewHampshire                     1109       40     1201        41
NewJersey                        7748        9     8143         9
NewMexico                        1515       37     1740        37
NewYork                         17991        2    18197         3
NorthCarolina                    6632       10     7651        11
NorthDakota                       639       47      634        47
Ohio                            10847        7    11257         7
Oklahoma                         3146       28     3358        27
Oregon                           2842       29     3316        28
Pennsylvania                    11883        5    11994         6
RhodeIsland                      1003       43      991        43
SouthCarolina                    3486       25     3886        26
SouthDakota                       696       45      733        46
Tennessee                        4877       17     5484        16
Texas                           16986        3    20044         2
Utah                             1723       35     2130        34
Vermont                           563       48      594        49
Virginia                         6189       12     6873        12
Washington                       4867       18     5756        15
WestVirginia                     1793       34     1807        36
Wisconsin                        4892       16     5250        18
Wyoming                           454       50      480        50

The tricky part on this problem is how you transform the variables. While the spreadsheet is on top in PROC INSIGHT go to the Variables option under the Edit menu. If you just wanted to (for example) take the exponent of one of your variables you would choose exp(Y) and then simply select what variable Y was and hit ok. This then adds the new variable to the spreadsheet. You can do some more complicated transformations by choosing Other... under the Variables option menu.

Once the new variable has been added to the sheet (make sure you checked what it would be called before you hit OK!) you can then use it in either Distribution (Y) or Fit (YX) just like you would any other variable.

Homework Four Notes:

Data for Problem 4:

Hair_Color	Pain_Thresh
LightBlond	62
LightBlond	60
LightBlond	71
LightBlond	55
LightBlond	48
DarkBlond	63
DarkBlond	57
DarkBlond	52
DarkBlond	41
DarkBlond	43
LightBrunette	42
LightBrunette	50
LightBrunette	41
LightBrunette	37
DarkBrunette	32
DarkBrunette	39
DarkBrunette	51
DarkBrunette	30
DarkBrunette	35

ANOVA Example (with Levene's Test and Holm Procedure): The following discusses the analysis of the data in Table 7-4 on page 297. The data consists of the cortisol levels in three groups of people: healthy individuals, and two-levels of depressed individuals. This data could be analyzed using PROC INSIGHT to get the ANOVA table and residual plots. Using PROC GLM it is also possible to test contrasts and perform Levene's test in addition to getting the ANOVA table.

The code below uses PROC GLM to produce the ANOVA table 7-5 on page 298.

DATA tab7p4;
INPUT group $ cort @@;
CARDS;
h	2.5	n	5.4	m	8.1
h	7.2	n	7.8	m	9.5
h	8.9	n	8.0	m	9.8
h	9.3	n	9.3	m	12.2
h	9.9	n	9.7	m	12.3
h	10.3	n	11.1	m	12.5
h	11.6	n	11.6	m	13.3
h	14.8	n	12.0	m	17.5
h	4.5	n	12.8	m	24.3
h	7.0	n	13.1	m	10.1
h	8.5	n	15.8	m	11.8
h	9.3	n	7.5	m	9.8
h	9.8	n	7.9	m	12.1
h	10.3	n	7.6	m	12.5
h	11.6	n	9.4	m	12.5
h	11.7	n	9.6	m	13.4
		n	11.3	m	16.1
		n	11.6	m	25.2
		n	11.8
		n	12.6
		n	13.2
		n	16.3
;

PROC GLM DATA=tab7p4 ORDER=DATA;
CLASS group;
MODEL cort=group;
RUN;

To conduct a test of hypotheses that the variances of the three groups are equal, we could add an extra line after the MODEL statement above. The following would perform the Levene median test (see page 325).

MEANS group / HOVTEST=BF;

The BF stands for Brown and Forsythe, the name that SAS uses for this particular test. For this example, the test of the null hypothesis that the variances for the three groups are equal is 0.6234. We would thus fail to reject the null hypothesis. (Remember that we need to use PROC INSIGHT to check the normality assumption still!)

Homework Five Notes:

Multiple Comparisons: (Continuing the above example...) To generate the output for the Holm test (as on page 315), we need to use PROC MULTTEST. The following code would perform the Holm test on all of the different pairs of groups. (SAS calls this the "Stepdown Bonferroni" method.) Notice that there are three columns of 1's, -1's and 0's... this corresponds to the fact that there are three varialbes. The first row being 1 -1 0 means we are comparing the first variable/healthy (1) to the second variable/nonm (-1) and ignoring the third/m (0).

PROC MULTTEST DATA=tab7p4 ORDER=DATA HOLM;
CLASS group;
CONTRAST 'healthy vs. nonm'  1 -1 0;
CONTRAST 'healthy vs. m'     1 0 -1;
CONTRAST 'nonm vs. m'        0 1 -1;
TEST mean(cort);
RUN;

The p-values in the column labeled Stepdown Bonferroni have already been adjusted so that you simply need to compare them to the family-wise alpha-level. The logic here is that the smallest p-value was 0.008, so you could either compare 0.0008 to alpha/3 or compare 0.0008*3 to alpha. You could then either compare 0.0157 to alpha/2 or compare 0.0157*2 to alpha. Then finally, you would compare 0.2014 to alpha either way. For this example we could write up the output for alpha = 0.001, 0.02, 0.05 and 0.25 as follows.

Group        Mean       a=0.001     a=0.02     a=0.05    a=0.25
----------------------------------------------------------------
m           13.500         A             B          B          C 
                           A             B
nonm        10.700         A          A  B        A          B
                           A          A           A
healthy      9.200         A          A           A        A 
----------------------------------------------------------------

For high alpha levels, we reject the null hypothesis more often (more likely to make type I errors) and are thus more likely to reject that the groups are the same. For a small alpha level it is hard to have enough evidence to reject that the groups are the same (more likely to make type II errors) and so we are likely to conclude they are all the same.

Contrasts: (Still continuing the above example...) The code below uses PROC GLM to produce the ANOVA table 7-5 on page 298. It also uses two contrasts (one comparing non-melancholic to healthy, and one comparing melancholic to healthy) to get the same results as the text does in figure 7-5 on page 301. The book used dummy variables however, and the code below uses contrasts; thus the variable Dn in the regression is the same as the contrast 'nonm minus healthy' and Dm is the same as the contrast 'm minus healthy'. The lines using ESTIMATE do the same thing as the lines with CONTRAST but they also return the estimated value (the same as the slopes in figure 7-5) and the t instead of F (recall F=t²).

PROC GLM DATA=tab7p4 ORDER=DATA;
CLASS group;
MODEL cort=group;
CONTRAST 'nonm minus healthy' group -1 1 0;
CONTRAST 'm minus healthy'    group -1 0 1;
ESTIMATE 'nonm minus healthy' group -1 1 0;
ESTIMATE 'm minus healthy'    group -1 0 1;
RUN;

SAS won't automatically construct the confidence intervals for the contrasts, but we could do it by hand from the output (or program SAS to do it). This is because the output from the ESTIMATE line gives both the estimate and the standard error for the estimate. (The degrees of freedom are the same as the degrees of freedom for the SS_wit.) The 95% CI for the difference of the means for the non-melancholic and healthy groups would thus be: 1.5 +/- 1.96 (1.15947) = 1.5 +/- 2.335 = (-0.835, 3.835).

Homework Six Notes:

Data for Problem 3:

IQ         ADOPTIVE BIOLOGIC 
136.00     High     High
99.00      High     High
121.00     High     High
133.00     High     High
125.00     High     High
131.00     High     High
103.00     High     High
115.00     High     High
94.00      High     Low
103.00     High     Low
99.00      High     Low
125.00     High     Low
111.00     High     Low
93.00      High     Low
101.00     High     Low
94.00      High     Low
98.00      Low      High
99.00      Low      High
91.00      Low      High
124.00     Low      High
100.00     Low      High
116.00     Low      High
113.00     Low      High
119.00     Low      High
92.00      Low      Low
91.00      Low      Low
98.00      Low      Low
83.00      Low      Low
99.00      Low      Low
68.00      Low      Low
76.00      Low      Low
115.00     Low      Low

SAS for Two-Way ANOVAs: The following discusses the analysis of the kidney data on pages 364-373 by performing the standard two-way ANOVA (including using Holm's test on the main effect and interactions).

DATA ksh;
INPUT strain $ site $ activity @@;
CARDS;
norm dct  62	norm dct  73	norm dct  58	norm dct  66
hyp  dct  44	hyp  dct  49 	hyp  dct  46	hyp  dct  37
norm ccd  15	norm ccd  31	norm ccd  19	norm ccd  35
hyp  ccd   8	hyp  ccd  36	hyp  ccd  11	hyp  ccd  18
norm omcd  7	norm omcd  7	norm omcd  9 	norm omcd 17
hyp  omcd 19	hyp  omcd  7	hyp  omcd 15	hyp  omcd  4
;

PROC GLM DATA=ksh ORDER=DATA;
CLASS strain site;
MODEL activity = strain site strain*site;
RUN;

After running this portion, note that the ANOVA table in the output is identical to the one given on the top of page 369, even though that one was gotten using dummy variable coding. Also note that if you add up the various Type I SS on the page 369 output, you will get the same values as the Type III SS in the SAS output from the above code.

Now, to see if the main effects (strain and site) and the interaction (strain*site) are significant, we actually need to do three tests. Because of this we should probably use the Holm procedure to control the family-wise error rate. To do this we use PROC MULTTEST, but instead of giving it contrasts we give it the p-values from the Type III tests. The data set pvals is are the p-values from above, and the p-values must be called raw_p. It is important to note what order you entered the values p-values in!

DATA pvals;
INPUT whichone $ raw_p;
CARDS;
strain	0.0156
site	0.0001
strain*site 0.0404
;

PROC MULTTEST PDATA=pvals HOLM;
RUN;

In this case, all three adjusted p-values are still less than 0.05, and so we would still say that both the main effects and interaction were significant with a family-wise alpha of 0.05.

To check the assumptions for this analysis, we need to get the residual plots and use Levene's test. The residual plots can be gotten from PROC INSIGHT simply by using Fit(YX). Choose activity as the Y variable, and both strain and site as the X variables. This is not enough, however. We still need to put the interaction term in. Highlight both strain and site at the same time, and then hit Cross. This will add strain*site to the list of independent variables. Now, hit OK and you can proceed as usual.

Unfortunately SAS will only run Levene's test for a one-way ANOVA. Because of this, we need to trick SAS into thinking this is a one-way ANOVA by actually making it into one! What we will do is change it into a one-way ANOVA with six different treatments normdct, hypdct, normccd, hypccd, normomcd, and hypomcd. The following code will do this using the || command for concatonating the names of the factor levels together after trimming off the extra spaces they may have. The following code makes the new data set ksh2 from the old data set ksh, prints it out so that you can see what its done, and then run's the modified Levene's test. (Recall that SAS calls this test the Brown and Forsythe test.)

DATA ksh2;
SET ksh;
KEEP  block activity;
block = trim(strain)||trim(site);

PROC PRINT data=ksh2;
RUN;

PROC GLM DATA=ksh2 ORDER=DATA;
CLASS block;
MODEL activity = block;
MEANS block / HOVTEST=BF;
RUN;

Notes on Homework 7

Below is the data for Homework Assignment 8. The variables are temperature in Fahrenheit and whether or not there was an O-ring failure.

53      1	56	1	57	1
63      0	66	0	67	0
67      0	67	0	68	0
69      0	70	0	70	1
70      1	70	1	72	0
73      0	75	0	75	1
76      0	76	0	78	0
79      0	80	0 	81	0

All of this problem (except the Hosmer-Lemeshow Statistic) can be conducted using PROC INSIGHT. In the FIT (YX) menu you have to choose several options after you select Y and X. Select METHOD using the button at the bottom. In the menu that pops up choose the response distribution Binomial, and the link function Logit.

PROC LOGISTIC will also calculate the Hosmer-Lemeshow Statistic. The following code would analyze the data in Table 12-1 on page 609, and it reproduces the output in figure 12-2 on page 611.

DATA grad;
INPUT int  success  failure;
total=success+failure;
CARDS;
1	0	2
2	0	2
3	0	5
4	0	3
5	1	6
6	1	3
7	1	2
8	4	7
9	5	4
10	7	4
11	7	2
12	7	1
13	11	1
14	7	0
15	11	0
16	5	0
17	7	0
18	2	0
19	2	0
;

PROC LOGISTIC DATA=grad;
MODEL success/total = int / LACKFIT;
RUN;

The data here was input in a slightly different format that the way the data for the homework assignment would be entered. For this data we had several observations recorded at the same intelligence level, all reported on a single line of input (e.g. there were two observations at z=1, 0 successes and 2 failures). Your input would have this on separate lines. For example, the data set grad would start like:

DATA grad;
INPUT int grad @@;
CARDS;
1	0	1	0	2	0
2	0	3	0	3	0
3	0	3	0	3	0	
4	0	4	0	4	0
5 	1	5	0	5	0
5	0	5	0	5	0
5	0	etc...
;

In this case the code used to run PROC LOGISTIC would be:

PROC LOGISTIC DATA=grad DESCENDING;
MODEL grad = int /LACKFIT;
RUN;

The log window will tell you that the DESCENDING means you are predicting the probability of getting a 1. Without it, you would be predicting the probability of getting a zero.

Computer Trouble?

You can always ask me a question about SAS programming issues.

In most cases, help with the computers (NOT the programming) can be gained by e-mailing help@stat.sc.edu

For the printers on the first and second floor, printer paper is available in the Stat Department office. For printers on the third floor, paper is available in the Math Department office.

If you are using a PC restarting the machine will fix many problems, but obviously don't try that if you have a file that won't save or the like. (It is always good to save what you are doing periodically on your z-drive.)

If SAS won't start, one of the things to check is that your computer has loaded the X drive correctly (whatever that means). Go to My Computer and see if the apps on 'lc-nt' (X:) is listed as one of the drives. If it isn't, go to the Tools menu and select Map Network Drive.... Select X for the drive, and enter \\lc-nt\apps for the Folder. Then click Finish. This should connect your computer to the X drive and allow SAS to run. If you already had the X-drive connected, then you will need to e-mail help@stat.sc.edu.

If your graphs print out extremely small after you copy them to word, you might be able to fix the problem by "opening and closing" the image. In word, left click once on the image, and select Edit Picture or Open Picture Object under the Edit menu. A separate window will open with the image in it. Simply choose Close Picture. It should now print out ok. This will also make the spacing between the characters in the labels look right if they were somewhat off. You can also modify the picture so that you can move it around the screen. Right click on the image twice and choose Format Object... in the menu that pops up. Choose the Layout tab, select In front of text, and click OK. (This might move the image a bit when you first do it, but you can just click on it to move it back.

If the problem is an emergency requiring immediate attention see Jason Dew in room 415.
If Jason is not available and it is an emergency see Minna Moore in room 417.
Flagrantly non-emergency cases may result in suspension of computer privileges.