From in class on November 10th


Sample Problem 1:

null hypothesis:  mu = 2
alternate hypothesis: mu > 2
sample average =  2.1
sample sd = 1.4
n = 8

---------------begin----------------------
DATA sample1;
INPUT mu avg sd n;
t  = (avg-mu)/(sd/sqrt(n));
df = n - 1;
pval = 1 - probt(t,df);
cards;
2 2.1 1.4 8
;
PROC PRINT;
RUN;
-----------------end------------------------

other commands for getting p-values include

PROBF(x,df1,df2)
PROBCHI(x,df1)
PROBNORM(x)


FINV(p,df1,df2) is the inverse of PROBF
CINV(p,df1) is the inverse of PROBCHI
TINV(p) is the inverse of PROBT
PROBIT(p) is the inverse of PROBNORM(x)
so that 

----------------begin------------------------
DATA sample2;
INPUT p df @@;
cutnorm = PROBIT(p);
cutt = TINV(p,df);
CARDS;
0.05 4  0.025  4  0.01 4 
;
PROC PRINT;
RUN;
-----------------end--------------------------


Table 5.11 Data

Area A	Area B
2.92	1.84
1.88	0.95
5.35	4.26
3.81	3.18
4.69	3.44
4.86	3.69
5.81	4.95
5.55 	4.47


----------------begin--------------------------
DATA sample3;
INPUT  AreaA  AreaB;
LABEL AreaA = "Air Pollution in Area A"
	Area B = "Air Pollution in Area B";
CARDS;


;
PROC PRINT;
RUN;
--------------end--------------------------------



To get a summary of AreaB use....


--------------begin------------------------------
PROC UNIVARIATE DATA=sample3 PLOT PCTLDEF=4;
	VAR AreaB;
	TITLE 'Summary of Area B';
RUN;
-------------end---------------------------------



To get a "better" summary use..........

-------------begin---------------------------------
PROC INSIGHT;
OPEN sample3;
DIST AreaB;
RUN;
-------------end------------------------------------

Under Curves on the menu bar, choose parametric density
and normal to add a normal curve to the data

Under Graphs on the menu bar, choose QQplot, normal and 
ok to get a Q-Q plot.

Under Tables choose CI  or Location Tests


Getting a confidence interval for the mean

------------begin---------------------------------
PROC MEANS DATA=sample3 N MEAN STD CLM ALPHA=0.05  MAXDEC=3;
	VAR AreaA;
RUN;
-------------end------------------------------------



To compare the two means


-----------begin------------------------------------
DATA sample4;
INPUT  group $ amount @@;
CARDS;
A 2.92	B 1.84
A 1.88	B 0.95
A 5.35	B 4.26
A 3.81	B 3.18
A 4.69	B 3.44
A 4.86	B 3.69
A 5.81	B 4.95
A 5.55 	B 4.47
;
PROC TTEST DATA=sample4;
CLASS group;
VAR amount;
RUN;
--------  end-----------------------------------------

This also tests that the variances are equal!

Note on Homework 14

There is one important difference between the example above and the homework. Above there are the same numbers of A's and B's. On the homework there are 8 of one and 10 of the other. In the first part (where we don't tell it what group), if you just put the 18 values in, it will think there are 9 of each!

There are two ways to deal with this: 1) put a period in each place a number would go but isn't there. 2) only enter the data for the new diet. So the input line would be:

INPUT new;
and you would only have one column of data then.

Also, remember that if it doesn't seem to have done what you think it should of to check the log window.

Note on Homework 16

Let's say we want to do regression using the data in sample3.   
A lot of the data can just be gotten by using PROC INSIGHT like
we did above to get the Q-Q plot.  Under the ANALYZE menu, choose 
FIT(YX).

Some of the statistics however can be found better using PROC REG.   
Say we are trying to predict AreaB using AreaA.  That is, AreaB is the
dependent variable or y, and Area A is the independent variable or x.

-------------begin---------------------------------
PROC REG data=sample3 GRAPHICS;
   model AreaB=AreaA;
   print cli;
run;
--------------end-----------------------------------

This output includes the ANOVA table in the section labeled 
"Analysis of Variance".   

The section labeled "Parameter estimates" includes the estimates 
of the b0 and b1 values.  The b0 is in the row labeled INTERCEP,
and the b1 is in the row labeled with the name of the x variable.  
That row also contains the t-test of whether b1 = 0 or not. 

The last part lists all the observed values of the dependent variable, 
their predicted values, and the confidence intervals for predicting 
those values.
 
Now lets say we wanted to predict AreaB when AreaA was 3.00.   
None of the AreaA's we entered are equal to 3.00, so the output 
doesn't give us that!

Add an extra data pair:

3.00 	.

to sample3 and try running it again though.   The input part 
should now look like...


-----------begin-------------------------------------- 
DATA sample3;
INPUT  AreaA  AreaB;
LABEL AreaA = "Air Pollution in Area A"
        Area B = "Air Pollution in Area B";
CARDS;
2.92    1.84
1.88    0.95
5.35    4.26
3.81    3.18
4.69    3.44
4.86    3.69
5.81    4.95
5.55    4.47
3.00       .
;
PROC PRINT;
RUN;
---------end-------------------------------------------

The extra value of AreaA doesn't change the line, it just means 
that the predicted value and confidence interval will be 
calculated for 3.00.

Note on Homework 17

Part b and d are the same as done on past homeworks. PROC TTEST however does NOT do a paired t-test for part c.

One thing you could do is to calculate the parts you need for the formula on page 196, and then get SAS to calculate the p-value. (It's the very first thing we did with SAS at the top of this page.)

A sneakier way can be illustrated using the data set sample 3 above. In between the data and label line, we would add the line.

d = AreaA - AreaB;

And then we would run PROC INSIGHT using DIST d;
One of the things we saw PROC INSIGHT could do is test the mean was zero. Similarly PROC UNIVARIATE gives a built in test that the mean was zero. (On the line, T: Mean = 0 .)