9 | Due: Thursday, December 4th | Homework 9 |
8 | Due: Thursday, November 20th | Homework 8 |
7 | Due: Thursday, November 13th | Homework 7 |
6 | Due: Thursday, October 30th | Consider the beardat PCA example we saw in class on September 25, 2008. Conduct classical multidimensional scaling on that data using Euclidean distances. Verify that the first two principal components found using the covariance matrix are the same as the first two dimensions in the scaling. (Give the code or menu options you used, and the PCA and scaling values corresponding to the first 10 bears.) |
5 | Due: Thursday, October 16th | Homework 5 |
4 | Due: Thursday, October 2nd |
The data set testdata.txt is from Mardia, Kent, and Bibby's Multivariate Analysis.It concerns the results of 88 students on a five part exam: 1) Closed Book on
Mechanics (Calculus like Physics); 2) Closed Book on Vectors; 3) Open
Book on Algebra; 4) Open Book on Analysis (The Theory of Calculus); and
5) Open Book in Statistics. Each exam is scored separately from 0 to 100.
a) Choose to either use the correlation matrix or the covariance matrix for
your principal components analysis and justify your choice based on the characteristics of the data.
|
3 | Due: Tuesday, September 23rd |
The data set
sat3.txt
contains the educational data for each state and the District of Columbia.
Part03 is the percent of students taking the SAT in 2003; Verbal03, Math03,
and Total03 are the state averages for the respective sections; and
Expend01 is the per pupil educational spending in 2001.
1) Construct a graphical display to support the argument that more money
is not associated with higher test scores.
In both cases make sure your graphical display is clearly labeled and includes appropriate captions. 3) There are a large number of ways of finding outliers in data sets (some we've seen in this class, some are obvious extensions of things we've done in this class, and some are things you might have seen in a course on multiple regression). Considering the three variables Part03, Total03, and Expend01 choose a method that seems reasonable to you to identify any states you think are outliers and briefly explain your results. |
2 | Due: Tuesday, September 16th |
1) Imagine that someone wanted to come up with a total score to summarize each persons view of the oil crisis (Q1-Q20). a) Explain why it doesn't make sense to just add up all of the numbers. b) Find the correlation matrix for Q1-Q20 data set and suggest two separate groups of questions that might be added separately. c) How could these two scores be combined to form a single score? 2) Check whether the data set normsamp.txt is actually multivariate normal. |
1 | Due: Tuesday, September 2nd |
This assignment uses the
oildata data set from in class on the 21st and 26th.
Using R, make two variables containing the ratings of "Low Energy Use", one for males and one for females. Conduct a two sample t-test to see whether there is a difference between the genders and check the assumptions using a q-q plot. Summarize your results. Hint: t.test(x,y), qqnorm(x), qqline(x), and look over the sample code we had for making a subset of the data in class. |