/* A complete regression example (Sec. 11.7) */ /* This is the fire damage data set (p. 636 of book) */ /* The response variable is fire damage (in thousands of dollars) to houses */ /* The predictor variable is distance from fire station (in miles) */ /* First we read in the data set and name the variables: */ DATA FIREDAM; INPUT DISTANCE DAMAGE; CARDS; 3.4 26.2 1.8 17.8 4.6 31.3 2.3 23.1 3.1 27.5 5.5 36 0.7 14.1 3 22.3 2.6 19.6 4.3 31.3 2.1 24 1.1 17.3 6.1 43.2 4.8 36.4 3.8 26.1 ; /* Let's do a scatter plot to see if a linear relationship is appropriate. */ PROC SGPLOT DATA=FIREDAM; SCATTER y=DAMAGE x=DISTANCE; RUN; /* The linear relationship seems reasonable based on the plot. */ /* Let's use PROC REG to do estimate the slope and intercept of the linear model */ /* Let's also plot the estimated regression line over our scatterplot: */ PROC REG DATA=FIREDAM; MODEL DAMAGE=DISTANCE; OUTPUT OUT=NEW P=PRED; RUN; /* Using PROC GPLOT gives a nice-looking plot of the estimated regression line. */ /* This plot comes up in a separate window from the OUTPUT window. */ /* First the data must be sorted based on the X variable before using PROC GPLOT. */ PROC SORT DATA=NEW; BY DISTANCE; symbol1 v=circle l=32 c = black; symbol2 i = join v=none l=32 c = black; PROC GPLOT DATA=NEW; PLOT DAMAGE*DISTANCE PRED*DISTANCE/ OVERLAY; RUN; /* Note that PROC REG gives a similar plot, so the PROC GPLOT example may not be necessary... */ /* So our least-squares regression line is Y-hat = 10.28 + 4.92 X */ /* How should we interpret these parameter estimates? */ /* Based on the plot, the points seem to fall around the regression line with */ /* roughly constant variance, so that assumption about the errors is probably OK. */ /* Note that our estimate of sigma, denoted s */ /* or "Root MSE" as SAS calls it, is 2.31635. */ /* Suppose we want to test the model usefulness. */ /* We test H_0: beta_1 = 0 against the two-tailed alternative. */ /* The test statistic value t for this test is given as 12.53 */ /* and the P-value is nearly 0. We reject H_0 and conclude the model is useful.*/ /* What about a 95% CI for beta_1? */ /* Note that the estimated slope is 4.919 and its standard error is given as 0.393. */ /* From table VI, t_(.025) = 2.16 (for n-2 = 13 d.f.). */ /* The 95% CI for beta_1 is (4.919 - 2.16*0.393, 4.919 + 2.16*0.393). */ /* Therefore the 95% CI for beta_1 is (4.070, 5.768). */ /* How do we interpret this CI? */ /* The correlation coefficient r can be found using the following code: */ PROC CORR DATA=FIREDAM; VAR DAMAGE DISTANCE; RUN; /* We see that r = .96098 for these data. */ /* The coefficient of determination r^2 is the square of r. */ /* It is given as .9235 on the PROC REG output. How do we interpret this value? */ /* Finally, suppose we want to calculate CIs for mean damage */ /* and prediction intervals for damage at particular distances. */ /* The options clm and cli will give us CIs for E(Y) and PIs for Y, for the values */ /* of X (the distances) in the data set. */ /* But suppose we were interested in the 95% CI for mean damage and 95% prediction */ /* interval for damage, for a house which was 3.5 miles from the fire station. */ /* Note that 3.5 is not among the distances in the data set, so we create a "new" */ /* observation with distance=3.5 and damage=. (missing): */ /* Note that I'm giving this data set a new name, PREDDAM: */ DATA PREDDAM; INPUT DISTANCE DAMAGE; CARDS; 3.4 26.2 1.8 17.8 4.6 31.3 2.3 23.1 3.1 27.5 5.5 36 0.7 14.1 3 22.3 2.6 19.6 4.3 31.3 2.1 24 1.1 17.3 6.1 43.2 4.8 36.4 3.8 26.1 3.5 . ; PROC REG DATA=PREDDAM; MODEL DAMAGE=DISTANCE /clm cli; RUN; /* Look at the last line of the "Output Statistics". This shows the prediction */ /* for the house which was 3.5 miles from the station. We see the predicted */ /* damage for this house is about 27.5 thousand dollars. The 95% CI for mean damage */ /* for all houses 3.5 miles from the station is between 26.2 and 28.8 thousand dollars. */ /* The prediction interval for the damage to a new house that is 3.5 miles from the */ /* station is (22.3, 32.7) in thousands of dollars. */