Click here for sample code to help on Homework 16
Click here for some notes on Homework 17
Sample Problem 1: null hypothesis: mu = 2 alternate hypothesis: mu > 2 sample average = 2.1 sample sd = 1.4 n = 8 ---------------begin---------------------- DATA sample1; INPUT mu avg sd n; t = (avg-mu)/(sd/sqrt(n)); df = n - 1; pval = 1 - probt(t,df); cards; 2 2.1 1.4 8 ; PROC PRINT; RUN; -----------------end------------------------ other commands for getting p-values include PROBF(x,df1,df2) PROBCHI(x,df1) PROBNORM(x) FINV(p,df1,df2) is the inverse of PROBF CINV(p,df1) is the inverse of PROBCHI TINV(p) is the inverse of PROBT PROBIT(p) is the inverse of PROBNORM(x) so that ----------------begin------------------------ DATA sample2; INPUT p df @@; cutnorm = PROBIT(p); cutt = TINV(p,df); CARDS; 0.05 4 0.025 4 0.01 4 ; PROC PRINT; RUN; -----------------end-------------------------- Table 5.11 Data Area A Area B 2.92 1.84 1.88 0.95 5.35 4.26 3.81 3.18 4.69 3.44 4.86 3.69 5.81 4.95 5.55 4.47 ----------------begin-------------------------- DATA sample3; INPUT AreaA AreaB; LABEL AreaA = "Air Pollution in Area A" Area B = "Air Pollution in Area B"; CARDS; ; PROC PRINT; RUN; --------------end-------------------------------- To get a summary of AreaB use.... --------------begin------------------------------ PROC UNIVARIATE DATA=sample3 PLOT PCTLDEF=4; VAR AreaB; TITLE 'Summary of Area B'; RUN; -------------end--------------------------------- To get a "better" summary use.......... -------------begin--------------------------------- PROC INSIGHT; OPEN sample3; DIST AreaB; RUN; -------------end------------------------------------ Under Curves on the menu bar, choose parametric density and normal to add a normal curve to the data Under Graphs on the menu bar, choose QQplot, normal and ok to get a Q-Q plot. Under Tables choose CI or Location Tests Getting a confidence interval for the mean ------------begin--------------------------------- PROC MEANS DATA=sample3 N MEAN STD CLM ALPHA=0.05 MAXDEC=3; VAR AreaA; RUN; -------------end------------------------------------ To compare the two means -----------begin------------------------------------ DATA sample4; INPUT group $ amount @@; CARDS; A 2.92 B 1.84 A 1.88 B 0.95 A 5.35 B 4.26 A 3.81 B 3.18 A 4.69 B 3.44 A 4.86 B 3.69 A 5.81 B 4.95 A 5.55 B 4.47 ; PROC TTEST DATA=sample4; CLASS group; VAR amount; RUN; -------- end----------------------------------------- This also tests that the variances are equal!
There are two ways to deal with this: 1) put a period in each place
a number would go but isn't there. 2) only enter the data for the
new diet. So the input line would be:
Also, remember that if it doesn't seem to have done what you think it should of to check the log window.
Let's say we want to do regression using the data in sample3. A lot of the data can just be gotten by using PROC INSIGHT like we did above to get the Q-Q plot. Under the ANALYZE menu, choose FIT(YX). Some of the statistics however can be found better using PROC REG. Say we are trying to predict AreaB using AreaA. That is, AreaB is the dependent variable or y, and Area A is the independent variable or x. -------------begin--------------------------------- PROC REG data=sample3 GRAPHICS; model AreaB=AreaA; print cli; run; --------------end----------------------------------- This output includes the ANOVA table in the section labeled "Analysis of Variance". The section labeled "Parameter estimates" includes the estimates of the b0 and b1 values. The b0 is in the row labeled INTERCEP, and the b1 is in the row labeled with the name of the x variable. That row also contains the t-test of whether b1 = 0 or not. The last part lists all the observed values of the dependent variable, their predicted values, and the confidence intervals for predicting those values. Now lets say we wanted to predict AreaB when AreaA was 3.00. None of the AreaA's we entered are equal to 3.00, so the output doesn't give us that! Add an extra data pair: 3.00 . to sample3 and try running it again though. The input part should now look like... -----------begin-------------------------------------- DATA sample3; INPUT AreaA AreaB; LABEL AreaA = "Air Pollution in Area A" Area B = "Air Pollution in Area B"; CARDS; 2.92 1.84 1.88 0.95 5.35 4.26 3.81 3.18 4.69 3.44 4.86 3.69 5.81 4.95 5.55 4.47 3.00 . ; PROC PRINT; RUN; ---------end------------------------------------------- The extra value of AreaA doesn't change the line, it just means that the predicted value and confidence interval will be calculated for 3.00.
Part b and d are the same as done on past homeworks. PROC TTEST however does NOT do a paired t-test for part c.
One thing you could do is to calculate the parts you need for the formula on page 196, and then get SAS to calculate the p-value. (It's the very first thing we did with SAS at the top of this page.)
A sneakier way can be illustrated using the data set sample 3 above.
In between the data and label line, we would add the line.
d = AreaA - AreaB;And then we would run PROC INSIGHT using DIST d;