Stat 516 - Spring 2003 - SAS Templates

The Basics of SAS (in class 1/17/02)
Homework 1 Notes
Homework 2 Notes
Homework 3 Notes
Homework 4 Notes
Homework 5 Notes
Homework 6 Notes
Homework 7 Notes
Book Data Sets
Computer Trouble?
SAS Won't Start?
Graphs Printing Small in Word?


The Basics of SAS:

When you start SAS there are three windows that are used. The Log window, the Program Editor window, and the Output window. If you happen to lose one of these windows they usually have a bar at the bottom of the SAS window. You can also find them under the View menu.

The Program Editor is where you tell SAS what you want done. The Output window is where it puts the results, and the Log window is where it tells you what it did and if there are any errors. It is important to note that the Output window often gets very long! You usually want to copy the parts you want to print into MS-Word and print from there. It is also important to note that you should check the Log window everytime you run anything. (The error SAS Syntax Editor control is not installed is ok though.) The errors will appear in maroon. Successful runs appear in Blue.

Hitting the [F3] key will run the program currently in the Program Editor window.

This will however erase whatever was written in the Program Editor window. To recall whatever was there, make sure you are in that window, and hit the [F4] key.

If you happen to lose a window, check under the View menu at the top.

If you keep running more programs it will keep adding it all to the Output window. To clear the Output window, make sure you are in that window, and choose Clear All under the Edit menu.

The code below uses the Texas employment data we saw in class.

OPTIONS pagesize=60 linesize=80;

DATA salary;
INPUT BSAL SAL77 SEX $ SENIOR AGE EDUC EXPER @@;
CARDS;
5100	8940	1	95	640	15	165	6300	10860	1	84	662	15	231
4800	8580	1	98	774	12	381	6000	9720	1	69	488	12	121
5280	8760	1	98	557	8	190	5100	9600	1	85	406	12	59
5280	8040	1	88	745	8	90	4800	11100	1	87	349	12	11
4800	9000	1	77	505	12	63	5100	10020	1	87	508	16	123
4800	8820	1	76	482	12	6	5700	9780	1	74	542	12	116.5
5400	13320	1	86	329	15	24	5400	10440	1	72	604	12	169
5520	9600	1	82	558	12	97	5100	10560	1	84	458	12	36
5400	8940	1	88	338	12	26	4800	9240	1	84	571	16	214
5700	9000	1	76	667	12	90	6000	11940	1	86	486	15	78.5
3900	8760	1	98	327	12	0	4380	10020	1	93	313	8	7.5
4800	9780	1	75	619	12	144	5580	7860	1	69	600	12	132.5
6120	9360	1	78	624	12	208.5	4620	9420	1	96	385	12	52
5220	7860	1	70	671	8	102	5220	8340	1	70	468	12	127
5100	9660	1	66	554	8	96	5040	12420	0	96	329	15	14
4380	9600	1	92	305	8	6.25	6300	12060	0	82	357	15	72
4290	9180	1	69	280	12	5	6000	15120	0	67	315	15	35.5
5400	9540	1	66	534	15	122	6000	16320	0	97	354	12	24
4380	10380	1	92	305	12	0	6000	12300	0	66	351	12	56
5400	8640	1	65	603	8	173	6840	10380	0	92	374	15	41.5
5400	11880	1	66	302	12	26	8100	13979	0	66	369	16	54.5
4500	12540	1	96	366	8	52	6000	10140	0	82	363	12	32
5400	8400	1	70	628	12	82	6000	12360	0	88	555	12	252
5520	8880	1	67	694	12	196	6900	10920	0	75	416	15	132
5640	10080	1	90	368	12	55	6900	10920	0	89	481	12	175
4800	9240	1	73	590	12	228	5400	12660	0	91	331	15	17.5
5400	8640	1	66	771	8	228	6000	12960	0	66	355	15	64
4500	7980	1	80	298	12	8	6000	12360	0	86	348	15	25
5400	11940	1	77	325	12	38	5400	10680	0	88	359	12	38
5400	9420	1	72	589	15	49	5400	11640	0	96	474	12	113
6300	9780	1	66	394	12	86.5	5100	7860	0	84	535	12	180
5160	10680	1	87	320	12	18	6600	11220	0	66	369	15	84
5100	11160	1	98	571	15	115	5100	8700	0	97	637	12	315
4800	8340	1	79	602	8	70	6600	12240	0	83	536	15	215.5
5400	9600	1	98	568	12	244	5700	11220	0	94	392	15	36
4020	9840	1	92	528	10	44	6000	12180	0	91	364	12	49
4980	8700	1	74	718	8	318	6000	11580	0	83	521	15	108
5280	9780	1	88	653	12	107	6000	8940	0	80	686	12	272
5700	8280	1	65	714	15	241	6000	10680	0	87	364	15	56
4800	8340	1	87	647	12	163	4620	11100	0	77	293	12	11.5
4800	13560	1	82	338	12	11	5220	10080	0	85	344	12	29
5700	10260	1	82	362	15	51	6600	15360	0	83	340	15	64
4380	9720	1	93	303	12	4.5	5400	12600	0	78	305	12	7
4380	10500	1	89	310	12	0	6000	8940	0	78	659	8	320
5700	10620	1	88	410	15	61	5400	9480	0	88	690	15	359
5400	10320	1	78	584	15	51	6000	14400	0	96	402	16	45.5
4440	9600	1	97	341	15	75							
;

Note that _most_ lines end with a semi-colon, but not all. SAS will crash if you miss one, but usually the log window will tell you where the problem is.

The OPTIONS line only needs to be used once during a session. It sets the length of the page and the length of the lines for viewing on the screen and printing. The font can be set by using the Options choice under the Tools menu along the top of the screen. When you cut and paste from SAS to a word processor, the font Courier New works well.

The DATA line defines what the name of the data set is. The name may not have any spaces; only letters, numbers, and underscores. It must start with a letter. In older versions of SAS the name must be eight characters long or less. The INPUT line gives the names of the variables, and they must be in the order that the data will be entered. The @@ at the end of the line says that there may be more than one observatin per line. If we had left it out then it would have skipped from the observation with a 5100 base salary right to the one with a 4800 base salary. If we had put a $ after sex says that that variable is a category/name and not a number.

If we hit F3 at this point to run what we put above, nothing new will appear on the output screen. This should be no big surprise, once we realize that we haven't told SAS to return any output! The code below simply tells SAS to print back out the data we entered.


PROC PRINT DATA=salary;
TITLE "The Salary Data";
RUN;

The most basic method for getting a summary of the data is to use PROC UNIVARIATE.

PROC UNIVARIATE DATA=salary PLOT FREQ ;

VAR sal77;
TITLE 'Summary The 1977 Salaries';
RUN;

The VAR line says which of the variables you want a summary of. Also note that the graphs here are pretty awful. We'll see in a few minutes how to make the graphics look better.

The code used to generate the PROC TTEST output on the hand out is:

PROC TTEST DATA=salary;

CLASS sex;
VAR sal77;
RUN;

One way to get a nicer q-q plot (normal probability plot) than the one that PROC UNIVARIATE makes (and to get a separate one for each sex) we could use PROC INSIGHT.

PROC INSIGHT;

OPEN salary;
RUN;
Another way to open PROC INSIGHT is to go to the Solutions menu, then to the Analysis menu, and then finally to the Ineteractive Data Analysis option. Once there you will need to go to the WORK library, and choose the salary data set.

Once you start PROC INSIGHT a spreadsheet containing the data should appear on the screen. To analyze this data, go to the Analyze menu, and choose Distribution (Y). In the white box under SALARY select sal77 and click the Y button. Then select sex and click the Group button. Finally, click OK.

This causes an output window with two sets of graphs side-by-side to appear. You can cut and paste the graphs from these windows right into microsoft word. Simply click on the border of the box you want to copy with the left mouse button to select it. You can then cut and paste like normal. Later, to quit PROC INSIGHT, you will simply click on the X in the upper right portion of the spreadsheet.

The plot to check for normality is found under the Curves menu, choose QQ Ref Line... and then just click ok in the box that pops up. If the data is approximately normally distributed then it should fall near the line. With so few data points however, it is hard to tell if the data looks close or not. We can conduct an hypothesis test of normality by selecting Tests for Normality under the Tables menu. The Anderson-Darling Test is perhaps the best of these tests to use... but I still prefer the Q-Q plot for small samples.

Throughout PROC INSIGHT are little boxes that give you additional options. The arrow box at the bottom of the histogram window contains a Ticks... option that lets you control what boxes and endpoints you use in the histogram. Try chaing the "Tick Increment". In the spread sheet, if you click on the black box at the end of any row, it gives the options to take any given point out of the calculations or graphs. If you do that the output window automatically updates. Similarly, if you click on any number,that point is highlighted in all the graphs.

Besides Distribution (Y) in the Analyze menu, we could also have chosen Fit(YX). Choose sex for X and sal77 for Y. If the X variable is numerical, this will conduct a linear regression. If it is a categorical (name) variable, it will conduct an Analysis of Variance. Now, click OK and we will see what the output looks like. We could repeat this again and add educ as another X variable.

We will see a number of other things in SAS throught the semester, and "instructions" will be posted here for most of the homework assignments.


Homework 1 Notes

Two-sample t-test: The following code would analyze the data in Example 5.3 on page 193.

DATA mesquite;
INPUT loc $ height @@;
CARDS;
A	1.70	A	2.00	M	1.30	M	0.90	M	1.50
A	3.00	A	1.30	M	1.35	M	1.35	M	1.50
A	1.70	A	1.45	M	2.16	M	1.40	M	1.20
A	1.60	A	2.20	M	1.80	M	1.00	M	0.70
A	1.40	A	0.70	M	1.55	M	1.70	M	1.20
A	1.90	A	1.90	M	1.20	M	1.50	M	0.80
A	1.10	A	1.80	M	1.00	M	0.65
A	1.60	A	2.00	M	1.70	M	1.50
A	2.00	A	2.20	M	0.80	M	1.70
A	1.25	A	0.92	M	1.20	M	1.70
; 
PROC TTEST DATA=mesquite;
CLASS loc;
VAR height;
RUN;

The assumption of equal variances can be checked by using the F-test on the PROC TTEST output (the p-value is 0.1364 in this case, so we can accept that the variances are equal if the two populations appear normal). PROC INSIGHT can be used to check the assumption of normality:

PROC INSIGHT;
OPEN mesquite;
RUN;

Choose Distribution (Y) under the Analyze menu, select height for Y and loc for Group, and hit ok. Then select QQ Ref Line... under the Curves menu. For this example both look very close to a straight line and so we would accept the assumption of normality. With much smaller data sets this can be hard to tell and there is nothing wrong with saying it is hard to tell because the data set is too small.

Linear Regression: The following code would analyze the data in Example 7.2 on page 293.

DATA housing;
INPUT size price @@;
CARDS;
0.951	30	1.532	93.5	2.336	129.9
1.036	39.9	1.647	94.9	1.98	132.9
0.676	46.5	1.344	95.8	2.483	134.9
1.456	48.6	1.550	98.5	2.809	135.9
1.186	51.5	1.752	99.5	2.036	139.5
1.456	56.99	1.450	99.9	2.298	139.99
1.368	59.9	1.312	102	2.038	144.9
0.994	62.5	1.636	106	2.370	147.6
1.176	65.5	1.5	108.9	2.921	149.99
1.216	69	1.8	109.9	2.262	152.55
1.410	76.9	1.972	110.0	2.456	156.9
1.344	79	1.387	112.29	2.436	164
1.064	79.9	2.082	114.9	1.920	167.5
1.770	79.95	.	119.5	2.949	169.9
1.524	82.9	2.463	119.9	3.310	175
1.750	84.9	2.572	119.9	2.805	179
1.152	85	2.113	122.9	2.553	179.9
1.770	87.9	2.016	123.938 2.510	189.5
1.624	89.9	1.852	124.9	3.627	199
1.540	89.9	2.670	126.9
;
PROC INSIGHT;
OPEN housing;
FIT price=size;
RUN;


Homework 2 Notes

Prediction Interval for Individual and Confidence Interval for Mean: The following code and instructions will let you produce output like that shown on page 314 and 315 of the text. It uses the data set housing that can be found in the Homework 1 Notes above. (Note that the housing prices as entered above are in a different order than the one provided on the text books web-site.)

 
PROC GLM DATA=housing;
MODEL price=size / ALPHA=0.05 CLM;
RUN;

PROC GLM DATA=housing;
MODEL price=size / ALPHA=0.05 CLI;
RUN;

The line with CLM produces the confidence interval for the mean predicted value (for the regression line). That is the one that is found on page 314-315. The line with CLI produces the prediction interval for a new observation. So, for example, the first observation is one that had size=0.951 and price=30. The predicted value is 58.7668. We can be 95% confident that the predicted value (the true regression line) would be between 49.4828 and 68.0507 for a house of size 0.951. We would expect 95% of all houses with area of 0.951 to have prices between 18.2563 and 99.2772. Say we wanted to estimate the price range for a house with an area of 0.800. We could do this by adding an extra observation to the data set, one with an area of 0.800 and only a period put in for the price (its missing, you don't have one.)

It is also possible to get the graphs like those on page 306 by using PROC INSIGHT. Start up PROC INSIGHT and perform the regression as usual using Fit(YX). The options to add the curves to the scatter plot can be found by choosing the Confidence Curves option under the Curves menu.

The Data for Question 2: Here is the raw data for this problem, with the first line being the names of the variables. Remember in the INPUT line to put a $ after state since it contains the names of the states and not numerical values.

state 		sat 	takers 	income 	years 	public 	expend 	rank
Iowa		1088	3	326	16.79	87.8	25.60	89.7
SouthDakota	1075	2	264	16.07	86.2	19.95	90.6
NorthDakota	1068	3	317	16.57	88.3	20.62	89.8
Kansas		1045	5	338	16.30	83.9	27.14	86.3
Nebraska 	1045	5	293	17.25	83.6	21.05	88.5
Montana		1033	8	263	15.91	93.7	29.48	86.4
Minnesota	1028	7	343	17.41	78.3	24.84	83.4
Utah		1022	4	333	16.57	75.2	17.42	85.9
Wyoming		1017	5	328	16.01	97.0	25.96	87.5
Wisconsin	1011  	10	304	16.85	77.3	27.69	84.2
Oklahoma 	1001	5	358	15.95	74.2	20.07	85.6
Arkansas 	999	4	295	15.49	86.4	15.71	89.2
Tennessee	999	9	330	15.72	61.2	14.58	83.4
NewMexico	997	8	316	15.92	79.5	22.19	83.7
Idaho		995	7	285	16.18	92.1	17.80	85.9
Mississippi	988	3	315	16.76	67.9	15.36	90.1
Kentucky 	985	6	330	16.61	71.4	15.69	86.4
Colorado 	983  	16	333	16.83	88.3	26.56	81.8
Washington	982  	19	309	16.23	87.5	26.53	83.2
Arizona		981  	11	314	15.98	80.9	19.14	84.3
Illinois 	977  	14	347	15.80	74.6	24.41	78.7
Louisiana	975   	5	394	16.85	44.8	19.72	82.9
Missouri 	975  	10	322	16.42	67.7	20.79	80.6
Michigan 	973  	10	335	16.50	80.7	24.61	81.8
WestVirginia	968   	7	292	17.08	90.6	18.16	86.2
Alabama		964   	6	313	16.37	69.6	13.84	83.9
Ohio		958  	16	306	16.52	71.5	21.43	79.5
NewHampshire	925  	56	248	16.35	78.1	20.33	73.6
Alaska		923  	31	401	15.32	96.5	50.10	79.6
Nevada		917  	18	288	14.73	89.1	21.79	81.1
Oregon		908  	40	261	14.48	92.1	30.49	79.3
Vermont		904  	54	225	16.50	84.2	20.17	75.8
California	899  	36	293	15.52	83.0	25.94	77.5
Delaware 	897  	42	277	16.95	67.9	27.81	71.4
Connecticut	896  	69	287	16.75	76.8	26.97	69.8
NewYork		896  	59	236	16.86	80.4	33.58	70.5
Maine		890  	46	208	16.05	85.7	20.55	74.6
Florida		889  	39	255	15.91	80.5	22.62	74.6
Maryland 	889  	50	312	16.90	80.4	25.41	71.5
Virginia 	888  	52	295	16.08	88.8	22.23	72.4
Massachusetts	888  	65	246	16.79	80.7	31.74	69.9
Pennsylvania	885  	50	241	17.27	78.6	27.98	73.4
RhodeIsland	877  	59	228	16.67	79.7	25.59	71.4
NewJersey	869  	64	269	16.37	80.6	27.91	69.8
Texas		868  	32	303	14.95	91.7	19.55	76.4
Indiana		860  	48	258	14.39	90.2	17.93	74.1
Hawaii		857  	47	277	16.40	67.6	21.21	69.9
NorthCarolina	827  	47	224	15.31	92.8	19.92	75.3
Georgia		823  	51	250	15.55	86.5	16.52	74.0
SouthCarolina	790  	48	214	15.42	88.1	15.60	74.0

Multiple Regression: PROC INSIGHT performs multiple regression exactly the same way that it performs simple linear regression. Simply choose as many different X variables as you want all at once.


Homework 3 Notes

Residual Diagnositics: You can find the various residuals diagnostics in PROC INSIGHT. After you use Fit(YX) to perform the regression, go to the Vars menu and choose the statistic you want. It will be added as a column to the spreadsheet.

Variable Selection: To perform model selection, you can use the procedure PROC REG. The following code would perform the model selection for the continuous variables in the data set salary that we saw in the lab on January 17th.

PROC REG DATA=salary;
MODEL   sal77 = bsal senior age educ exper /
        SELECTION = RSQUARE ADJRSQ CP;
RUN;

The Data for Question 2: Here is the raw data for this problem, with the first line being the names of the variables. Remember in the INPUT line to put a $ after Area since it contains the names of the states and not numerical values.

Rank	    Area		April_1_2000	April_1_1990
1	    California	33871648	29760021
2	    Texas	20851820	16986510
3	    New_York	18976457	17990455
4	    Florida	15982378	12937926
5	    Illinois	12419293	11430602
6	    Penn	12281054	11881643
7	    Ohio	11353140	10847115
8	    Michigan	9938444	9295297
9	    NJersey	8414350	7730188
10	    Georgia	8186453	6478216
11	    NCarolina	8049313	6628637
12	    Virginia	7078515	6187358
13	    Mass	6349097	6016425
14	    Indiana	6080485	5544159
15	    Washington	5894121	4866692
16	    Tennessee	5689283	4877185
17	    Missouri	5595211	5117073
18	    Wisconsin	5363675	4891769
19	    Maryland	5296486	4781468
20	    Arizona	5130632	3665228
21	    Minnesota	4919479	4375099
22	    Louisiana	4468976	4219973
23	    Alabama	4447100	4040587
24	    Colorado	4301261	3294394
25	    Kentucky	4041769	3685296
26	    SCarolina	4012012	3486703
27	    Oklahoma	3450654	3145585
28	    Oregon	3421399	2842321
29	    Connecticut	3405565	3287116
30	    Iowa	2926324	2776755
31	    Mississippi	2844658	2573216
32	    Kansas	2688418	2477574
33	    Arkansas	2673400	2350725
34	    Utah	2233169	1722850
35	    Nevada	1998257	1201833
36	    NMexico	1819046	1515069
37	    WVirginia	1808344	1793477
38	    Nebraska	1711263	1578385
39	    Idaho	1293953	1006749
40	    Maine	1274923	1227928
41	    NHampshire	1235786	1109252
42	    Hawaii	1211537	1108229
43	    RIsland	1048319	1003464
44	    Montana	902195	799065
45	    Delaware	783600	666168
46	    SDakota	754844	696004
47	    NDakota	642200	638800
48	    Alaska	626932	550043
49	    Vermont	608827	562758
50	    Wyoming	493782	453588

The tricky part on this problem is how you transform the variables. While the spreadsheet is on top in PROC INSIGHT go to the Variables option under the Edit menu. If you just wanted to (for example) take the exponent of one of your variables you would choose exp(Y) and then simply select what variable Y was and hit ok. This then adds the new variable to the spreadsheet. You can do some more complicated transformations by choosing Other... under the Variables option menu.

Once the new variable has been added to the sheet (make sure you checked what it would be called before you hit OK!) you can then use it in either Distribution (Y) or Fit (YX) just like you would any other variable.


Homework 4 Notes

All of the code needed for performing a one-way ANOVA is contained in the supplement to 6.5. Here is the code from the supplement that you can cut and past in if you want.

Entering the data: Here is the data from Table 6.21 on page 264.

DATA shrimp_weights;
INPUT diet $ weight @@;
CARDS;
cafo_1 47.0	cafo_1 50.9	cafo_1 45.2	cafo_1 48.9	cafo_1 48.2
calo_2 38.1 	calo_2 39.6	calo_2 39.1	calo_2 33.1    	calo_2 40.3
faso_3 57.4	faso_3 55.1	faso_3 54.2 	faso_3 56.8     faso_3 52.5
falo_4 54.2	falo_4 57.7	falo_4 57.1	falo_4 47.9	falo_4 53.4
bc_5   38.5	bc_5   42.0	bc_5   38.7	bc_5   38.9	bc_5   44.6
lma_6  48.9	lma_6  47.0	lma_6	 47.0	lma_6  44.4	lma_6  46.9
lmaa_7 87.8	lmaa_7 81.7	lmaa_7 73.3	lmaa_7 82.7	lmaa_7 74.8 
;
Checking the assumptions and getting the ANOVA table: PROC INSIGHT can be used to get the two residual plots, while PROC GLM will conduct the modified Levene test. Both will return the ANOVA table.
 
PROC INSIGHT;
OPEN shrimp_weights;
FIT weight=diet;
RUN;

PROC GLM DATA=shrimp_weights ORDER=DATA;
CLASS diet;
MODEL weight=diet;
MEANS diet / HOVTEST=BF;
RUN;
Estimating contrasts: The following code will produce the output in table 6.33. Notice that it is NOT adjusted for the Holm procedure BUT it does return both the estimate (L-hat) and the estimated standard deviation of (L-hat) that could be used to make a confidence interval. It is important to note that if you make more than one confidence interval though you will have multiple comparison problems and want to Scheffe (or Tukey if you are doing only pairwise comparisons).
PROC GLM DATA=shrimp_weights ORDER=DATA;
CLASS diet;
MODEL weight=diet;
ESTIMATE 'newold'  diet  3  3  3  3 -4 -4 -4 / divisor=12;
ESTIMATE 'corn'	 diet  5  5 -2 -2 -2 -2 -2 / divisor=10;
ESTIMATE 'fish'	 diet  4 -3  4  4 -3 -3 -3 / divisor=12;
ESTIMATE 'lin'	 diet -2  5 -2  5 -2 -2 -2 / divisor=10;
ESTIMATE 'sun'	 diet -1 -1  6 -1 -1 -1 -1 / divisor=6;
ESTIMATE 'mic'	 diet -2 -2 -2 -2 -2  5  5 / divisor=10;
ESTIMATE 'art'	 diet -1 -1 -1 -1 -1 -1  6 / divisor=6;
RUN;
All pairwise comparisons: PROC MULTTEST can be used for all pairwise comparisons... and it automatically adjusts the p-values. The format is very similar to that of PROC GLM but it is not exactly the same.
PROC MULTTEST DATA=shrimp_weights ORDER=DATA HOLM;
CLASS diet;
CONTRAST '1 vs. 2'  1 -1  0  0  0  0  0;
CONTRAST '1 vs. 3'  1  0 -1  0  0  0  0;
CONTRAST '1 vs. 4'  1  0  0 -1  0  0  0; 
CONTRAST '1 vs. 5'  1  0  0  0 -1  0  0;
CONTRAST '1 vs. 6'  1  0  0  0  0 -1  0;
CONTRAST '1 vs. 7'  1  0  0  0  0  0 -1;
CONTRAST '2 vs. 3'  0  1 -1  0  0  0  0;
CONTRAST '2 vs. 4'  0  1  0 -1  0  0  0;
CONTRAST '2 vs. 5'  0  1  0  0 -1  0  0;
CONTRAST '2 vs. 6'  0  1  0  0  0 -1  0;
CONTRAST '2 vs. 7'  0  1  0  0  0  0 -1;
CONTRAST '3 vs. 4'  0  0  1 -1  0  0  0;
CONTRAST '3 vs. 5'  0  0  1  0 -1  0  0;
CONTRAST '3 vs. 6'  0  0  1  0  0 -1  0;
CONTRAST '3 vs. 7'  0  0  1  0  0  0 -1;
CONTRAST '4 vs. 5'  0  0  0  1 -1  0  0;
CONTRAST '4 vs. 6'  0  0  0  1  0 -1  0;
CONTRAST '4 vs. 7'  0  0  0  1  0  0 -1;
CONTRAST '5 vs. 6'  0  0  0  0  1 -1  0;
CONTRAST '5 vs. 7'  0  0  0  0  1  0 -1;
CONTRAST '6 vs. 7'  0  0  0  0  0  1 -1;
TEST mean(weight);
RUN;

The Data for Question 2:

Angry 2.10 Angry  0.64 Angry 0.47 Angry 0.37 Angry 1.62 Angry -0.08 
Disg  0.40 Disg   0.73 Disg -0.07 Disg -0.25 Disg  0.89 Disg   1.93
Fear  0.82 Fear  -2.93 Fear -0.74 Fear  0.79 Fear -0.77 Fear  -1.60
Happy 1.71 Happy -0.04 Happy 1.04 Happy 1.44 Happy 1.37 Happy  0.59
Sad   0.74 Sad   -1.26 Sad  -2.27 Sad  -0.39 Sad  -2.65 Sad   -0.44
Neut  1.69 Neut  -0.60 Neut -0.55 Neut  0.27 Neut -0.57 Neut  -2.16


Homework 5 Notes

The Data for Question 3:

IQ         ADOPTIVE BIOLOGIC 
136.00     High     High
99.00      High     High
121.00     High     High
133.00     High     High
125.00     High     High
131.00     High     High
103.00     High     High
115.00     High     High
94.00      High     Low
103.00     High     Low
99.00      High     Low
125.00     High     Low
111.00     High     Low
93.00      High     Low
101.00     High     Low
94.00      High     Low
98.00      Low      High
99.00      Low      High
91.00      Low      High
124.00     Low      High
100.00     Low      High
116.00     Low      High
113.00     Low      High
119.00     Low      High
92.00      Low      Low
91.00      Low      Low
98.00      Low      Low
83.00      Low      Low
99.00      Low      Low
68.00      Low      Low
76.00      Low      Low
115.00     Low      Low

Code for the Kidney Example from In Class:

DATA ksh;
INPUT strain $ site $ activity @@;
CARDS;
norm dct  62	norm dct  73	norm dct  58	norm dct  66
hyp  dct  44	hyp  dct  49 	hyp  dct  46	hyp  dct  37
norm ccd  15	norm ccd  31	norm ccd  19	norm ccd  35
hyp  ccd   8	hyp  ccd  36	hyp  ccd  11	hyp  ccd  18
norm omcd  7	norm omcd  7	norm omcd  9 	norm omcd 17
hyp  omcd 19	hyp  omcd  7	hyp  omcd 15	hyp  omcd  4
;

PROC INSIGHT;
OPEN ksh;
FIT activity=strain site strain*site;
RUN;

DATA ksh2;
SET ksh;
KEEP  block activity;
block = trim(strain)||trim(site);

PROC PRINT data=ksh2;
RUN;

PROC GLM DATA=ksh2 ORDER=DATA;
CLASS block;
MODEL activity = block;
MEANS block / HOVTEST=BF;
RUN;

DATA pvals;
INPUT whichone $ raw_p;
CARDS;
strain	0.0156
site		0.0001
strain*site 0.0404
;
PROC MULTTEST PDATA=pvals HOLM;
RUN;


Homework 6 Notes

The following code would analyze the data in Example 10.3 on pages 473-476 (using the data downloaded from the web-page. You want the link for FW10x03, even though the page that will come up calls the data set fw10x02).

DATA FW10x03; 
INPUT  OBS    LAB    MATERIAL $   STRESS;   
CARDS;       
< input data set here >
;

PROC GLM DATA=FW10x03 ORDER=DATA;
CLASS LAB MATERIAL;
MODEL STRESS = LAB MATERIAL LAB*MATERIAL;
RANDOM LAB LAB*MATERIAL / TEST;
MEANS MATERIAL / DUNCAN E=LAB*MATERIAL;
OUTPUT OUT=fwresids P=pred R=resid;
RUN;
Note that the E=LAB*MATERIAL on the MEANS line tells it that you want to use a different denominator for conducting the test (see for example the end of the output on page 475). You can tell it should be E=LAB*MATERIAL by checking the EMS for testing MATERIAL.

The DUNCAN command is something we didn't cover in section 6.5. It makes a display showing which values are different from each other, much as we did with the Holm procedure (see the display on page 476). However, it doesn't keep the family-wise error rate under control -and- it doesn't have as much power as just ignoring the family-wise error rate. I can't really recommend using it.

The analysis of latin squares works just like the analysis of a factorial design (using PROC GLM), just remember that you cannot include the interaction terms.


Homework 7 Notes

Logistic Regression The following code will analyze the data in Table 11.8 on page 532 using logistic regression (it is probably a better analysis than the one the book has, so don't pay attention to what the output on page 533 looks like). This analysis is shown on page 541.

DATA fw11x04;                                        
INPUT y $ income;                                      
CARDS;                                               
0 9.2                                               
0 12.9                                               
0 9.2                                               
1 9.6                                                
0 9.3                                                
1 10.1                                               
0 9.4                                                
1 10.3                                               
0 9.5                                                
1 10.9                                                
0 9.5                                               
1 10.9                                               
0 9.5                                                
1 11.1                                               
0 9.6                                               
1 11.1                                               
0 9.7                                                
1 11.1                                               
0 9.7                                                
1 11.5                                               
0 9.8                                                
1 11.8                                               
0 9.8                                                
1 11.9                                               
0 9.9                                                
1 12.1                                               
0 10.5                                              
1 12.2                                               
0 10.5                                               
1 12.5                                               
0 10.9                                               
1 12.6                                               
0 11                                                
1 12.6                                               
0 11.2                                               
1 12.6                                               
0 11.2                                               
1 12.9                                              
0 11.5                                              
1 12.9                                              
0 11.7                                               
1 12.9                                               
0 11.8                                               
1 12.9                                               
0 12.1                                               
1 13.1                                              
0 12.3                                              
1 13.2                                               
0 12.5                                           
1 13.5
;

PROC LOGISTIC DATA=fw11x04 DESCENDING;
MODEL y=income /LACKFIT;
RUN;
The important part of the output is:
          Analysis of Maximum Likelihood Estimates

                               Standard
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq
Intercept     1   -11.3472      3.3511       11.4660        0.0007
income        1   1.0018      0.2954       11.5013        0.0007



              Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq
Likelihood Ratio        15.5687        1       < .0001
Score                   14.1810        1         0.0002
Wald                    11.5013        1         0.0007


            Hosmer and Lemeshow Goodness-of-Fit Test

Chi-Square       DF     Pr > ChiSq
6.7043        7         0.4603
Looking at the equation on page 536, -11.3472 is the estimate of beta0 and 1.0018 is the estimate of beta1. The likelihood ratio test p-value of < 0.0001 tests the null hypothesis that beta1 is zero, and the p-value of 0.4603 from the Hosmer and Lemeshow test tests the null hypothesis that the logistic form is appropriate.


Data Sets

The data sets from the book can be found at the Statistical Methods Companion Website.


Computer Trouble?

In most cases, help with the computers (NOT the programming) can be gained by e-mailing help@stat.sc.edu

For the printers on the first and second floor, printer paper is available in the Stat Department office. For printers on the third floor, paper is available in the Math Department office.

If you are using a PC restarting the machine will fix many problems, but obviously don't try that if you have a file that won't save or the like.

If SAS won't start, one of the things to check is that your computer has loaded the X drive correctly (whatever that means). Go to My Computer and see if the apps on 'lc-nt' (X:) is listed as one of the drives. If it isn't, go to the Tools menu and select Map Network Drive.... Select X for the drive, and enter \\lc-nt\apps for the Folder. Then click Finish. This should connect your computer to the X drive and allow SAS to run. If you already had the X-drive connected, then you will need to e-mail help@stat.sc.edu.

If your graphs print out extremely small after you copy them to word, you might be able to fix the problem by "opening and closing" the image. In word, left click once on the image, and select Edit Picture or Open Picture Object under the Edit menu. A separate window will open with the image in it. Simply choose Close Picture. It should now print out ok. This will also make the spacing between the characters in the labels look right if they were somewhat off.

If the problem is an emergency requiring immediate attention see Jason Dew in room 415.
If Jason is not available and it is an emergency see Minna Moore in room 417.
Flagrantly non-emergency cases may result in suspension of computer privileges.