SCCC 312 - Homework Assignment 2

The assignment uses the data set tuna.txt posted on the web at http://www.stat.sc.edu/~habing/courses/data/tuna.txt. The first column is the age of the tuna (in days) estimated by counting the rings in its otoliths. The second column is the fork length (length from fork in tail to nose) in cm. The 90 fish in this case were caught in the Indian Ocean.

As this assignment will require use of MINITAB, we must first get the data loaded into MINITAB. The easiest way to do this is to simply cut and paste from the web page into the worksheet... MINITAB is smart enough to figure out which values go in which columns. Another way of doing this is by saving the text file onto your z-drive by using your web-browser. In MINITAB, under the File menu, select Other Files and Import Special Text. In the box that comes up, put C1-C2 in the box labeled Store data in column(s)... and hit OK. You now need to use the window that comes up to locate the file on your Z-drive. It will help to change the Files of type to .TXT. Remember to give titles to the two columns. (Of course, you could always try typing in all the numbers, but that is probably more annoying!).

Making a Q-Q plot: A Q-Q plot (or normal probability plot) is a graph of the data against a graph of "what the data would look like if it were exactly normally distributed". In this section we will construct a Q-Q plot for the fish lengths and see if they are approximately normally distributed for this population. To see how it works, we will do it the long way.

  1. The first step is to rank the fish according to their lengths. We will store these ranks in column C3. To do this, select Rank... under the Manip menu option. Put C2 in the Rank data in: box, and C3 in the Store ranks in: box. Then click OK.

  2. Next we need to change these ranks into percentiles. The easiest way to do this is to divide the ranks by the number of observations + 1. To do this, select Calculator... under the Calc menu. Put C4 in the Store results in: window, and put C3/91 in the Expression window. Then click OK.

  3. Now, the point of doing this is to figure out what values from a normal curve the lengths would have if they were really from a normal distribution. The function that tells us what value a normal curve has for a given percentile is called the "Inverse cumulative probability". Under the Calc menu, choose Probability Distributions and Normal.... Choose Inverse cumulative probability, put C4 in the Input column and C5 in the Optional storage. Then hit OK.

  4. Finally, we can make the Q-Q Plot! In the Regression menu under Stat, choose Fitted Line Plot. Choose Length for the Predictor and the C5 column for Response and hit ok.
Remember to print out and hand in the finished Q-Q plot (along with a few other plots you will be asked to make below).

Questions

  1. What causes some of the ranks to have a decimal part, like 84.5?

  2. The method used above to find the percentile based on the rank is one of many possibilities. Why would we want to divide by the (# of obs. + 1) instead of (# of obs.)?
    [Hint: Consider an odd number of observations, and the corresponding percentiles for the one in the middle, and the two extreme ones.]

  3. Notice that it fits a line fairly well, except for the ends. In particular, the points for the longer tuna don't seem to fit it very well at all. Would the largest five tuna have to be longer or shorter in order for the points in the Q-Q plot to fit the line better? How can you tell from the graph?

  4. Try out your theory in question 3 by changing the lengths of the last five tuna in order to make the Q-Q plot look better. Repeat the steps above and print out the new Q-Q plot. Were you right?

  5. Use minitab to calculate the mean and standard deviation of the lengths of the tuna (Remember to undo the changes you made in question 4!). Use this to check how well the Empirical rule seems to apply to this set of data.

  6. Construct a regression line for predicting the age of the fish from the length of the fish. Comment on how well this method of calculating the age of the fish seems to work. (It saves the fish from having to be cut open for starters!)