Fall 2007 - Stat 518 Homework

Fall 2007
Statistics 518 - Nonparametric Statistical Methods
Tuesday/Thursday 4:00-5:15
210A LeConte

Course Website: http://www.stat.sc.edu/~habing/courses/518F07.html

Opt	Due: 5:30 pm, Saturday, December 15th	1) Consider the following data set from an experiment where the blocks are expected to behave differently. Treatment Blocks 1 2 3 4 1 8.0 3.5 5.3 5.2 2 10.5 5.7 10.1 7.4 3 10.8 8.0 8.4 6.9 A) Conduct the appropriate nonparametric test of the null hypothesis that the effects of the treatments are the equal. What is your conclusion? B) Now conduct the appropriate nonparametric test if there was no blocking variable (just 3 people assigned to each treatment). What would th e conclusion be? C) Why does it make sense that the test in A had a smaller p-value than the test in B? D) Now imagine that the experimenter didn't realize the data was supposed to be in blocks and it got shuffled up. He decides that since the blocked design gives a smaller p-value that he'll make up some blocks and use those. Treatment Blocks 1 2 3 4 1 10.8 3.5 10.1 6.9 2 10.5 5.7 5.3 5.2 3 8.0 8.0 8.4 7.4 Conduct the same test as in A on this data set. What would the conclusion be? E) Based on the p-values for the three tests you've done, what conclusion can you reach about whether blocking is helpful or not? 2) Construct a data set of five points that has a Spearman's rho and a Kendall's tau of 1, but would have a correlation coefficient that is considerably smaller than 1. (A bonus point to the person with the smallest value).
5	Due: Tuesday, November 20	1) Find a possible data set for the project, say what hypothesis you would test on it, and say whether the normality assumption seems met. It can not be a data set from a statistics text book. 2) Consider problem 4 on page 365. a) State the appropriate null and alternate hypothesis. b) Perform the Wilcoxon signed rank test at alpha=0.05 for your hypothesis in a, and state your conclusion in terms of the problem. c) For each assumption, either check it, or say why it can't be checked. d) Also perform the paired t-test and the quantile test (for p=0.5). Explain whether or not the relationship between the p-values for the three tests you've done seem to make sense based on what the AREs are and the shape of the distribution. e) Construct a (95%) CI based on the Wilcoxon signed rank test. Give two reasons why you should be suprised that you could reject your null hypothesis in b, but that the CI includes 0, even though you told it to use alpha=0.05 for both. 3) A simulation is run to compare the power of one-way ANOVA and the KW test when the data is heavy tailed (t with df=7) and from a small sample size (n=10). Ten thousand data sets are simulated where the actual difference in the means is 1. In 4,149 cases both tests reject the null; in 289 just the t-test rejects; and in 436 cases just the KW test rejects. What do you estimate the power of each test to be, and are these values statistically significantly different.
4	Due: Tuesday, October 23	1) Consider problem number 4 on page 63. a) Give the formula you would use to find this value exactly. b) Use R to find that exact value. (Note that pbinom(x,size=n,prob=p) gives P[X<=x] for a binomial with parameters n and p). c) Use the central limit theorem with the continuity correction (the +/-0.5) to approximate the value to three decimal places. (There is a normal table in the back of the text, or pnorm(z) gives P[Z<=z] for a standard normal.) d) Use the central limit theorem without the continuity correction to approximate the value to three decimal places. Page 93 #7 (show all of your work) Page 113 #2. For part c, find the power when p is really 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9. (Hint: The binomial distribution will give you the probabilities you need. dbinom(x,size=n,prob=p) in R will give you P[X=x] for a binomial with parameters n and p.
3	Due: Thursday, September 20th	A jury pool for a trial consists of 24 women and 18 men. Tomorrow, twelve people will be selected from this pool to be on the jury and the defense attorney is worried that there will be too many women selected. 1) If the jury was selected at random, is the number of women selected a binomial random variable, or a hypergeometric random variable? 2) If the defense attorney assumed this was a binomial experiment, what would they get for: the probability that the number of women selected would be 12, the expected number of women selected, and the standard deviation of the number of women selected. 3) If the defense attorney assumed this was a hypergeometric experiment, what would they get for: the probability that the number of women selected would be 12, the expected number of women selected, and the standard deviaition of the number of women selected. You will probably want a calculator or computer program for calculating (e.g. excel or R) for this problem. Note that the function `choose(n,k)` in R will give you the binomial coefficients, e.g. 8!/(6!2!) is 8 choose 6 is `choose(8,6)` = 28. A `^` can be used to get powers, e.g. 2³ is `2^3` = 8. A `` is used for multiplication so that 2 times 4 is `24` = 8.
2	Due: Thursday, September 13th	Pg. 12 #2, also, what if each letter is used only once? Pg. 21 #6 Pg. 33 #2a Pg. 63 #4 Consider flipping two fair coins. Let A={1st coin heads}, B={2nd coin heads}, C={both coins the same}. Show that each pair A-B, B-C, and A-C are independent, but that the three together are not mutually independent.
1	Due: Thursday, September 6th	The data set testdata.txt contains the scores of 88 students on a five part exam: 1) Closed Book on Mechanics (Calculus like Physics); 2) Closed Book on Vectors; 3) Open Book on Algebra; 4) Open Book on Analysis (The Theory of Calculus); and 5) Open Book in Statistics. Each exam is scored separately from 0 to 100. You may assume that the students are a simple random sample from the population of all students taking this set of courses at this University. It is desired to see if the mean score for all such students on the analysis exam differ from the mean score on the algebra exam. State the appropriate hypotheses and assumptions for conducting a paired t-test. Use both SAS and R to produce the results of the hypothesis test (at alpha=0.05) and any plots needed to check the assumptions. Be sure to include the code used and any menu options. Briefly summarize your findings.