7 | Due: Friday, December 5th |
This assignment uses the
www.stat.sc.edu/~habing/courses/data/crabs2.txt data set from
the previous assignment.
1) It is desired to perform a two dimensional multi-dimensional scaling of the crab data using the standardized values, and hoped that the scaling will separate the four types of crabs. Why would two dimensions be desirable? Which of the three methods (Classical, Sammon, or Kruskal's nonmetric (iso)) do you think has the best chance of showing distinct clusters that lie in a data set? Why? (You shouldn't have to perform the scaling to answer this.) 2) Perform the scaling method you chose in part 1 on the crab data and construct a plot of the scaling where each crab is denoted solely by a dot. By looking at the separation of points in the scaling, divide the scaling into separate clusters. (Do not do any additional analyses to help you do this! You do not need to make 4 distinct clusters if you can't see them in the plot.) 3) Re-plot the scaling in 2, this time labeling each crab by its type. Compare this plot to the best of the cluster analysis dendograms from the previous homework. A group of six B/b crabs were apparently misclassified as O/o crabs in the cluster analysis. Guess which six crabs these are on the scaling (do not do any additional analyses to find them!). Two crabs in the cluster analysis were separated out from the remaining crabs. Guess which two crabs these are on the scaling (don not do any additional analyses to find them!). 4) Finally, repeat both the clustering and the scaling, labeling each observation by its observation number. How well did you guess in part 3. |
6 | Due: Monday, November 24th | www.stat.sc.edu/~habing/courses/530hmwk6F03.pdf |
5 | Due: Monday, November 10th |
This assignment uses the data set
http://www.stat.sc.edu/~habing/courses/data/orange.txt. The data set concerns several
samples of orange juice from several different countries (BEL, LSP, TME,
and VME). Each of them has had several chemical elements measured: boron (B),
barium (BA), calcium (CA), potassium (K), magnesium (MG), manganese (MN),
phosphorous (P), rubidium (RB), and zinc (ZN). The first varibable is
simply an ID number.
|
4 | Due: Friday, October 17th | www.stat.sc.edu/~habing/courses/530hmwk4F03.pdf |
3 | Due: Monday, October 6th | www.stat.sc.edu/~habing/courses/530hmwk3F03.pdf |
2 | Due: Monday, September 22nd | www.stat.sc.edu/~habing/courses/530hmwk2F03.pdf |
1 | Due: Friday, September 5th |
1) The web page
http://www.stat.sc.edu/~habing/courses/data/draft70.txt contains the
data from the 1970 draft lottery to determine the order in which people would
have to report to the draft board to serve in the Vietnam war (if they were
still needed). A container was filled with capsules, one for each day of the year. The container was then shaken and the capsules were drawn and the order
of the birthdays was recorded. This order was the order in which people
were called up for the draft. The first column in the data set is the day of the year from 1 to 366 (including leap year), the second column is the order in which the capsule containing that date was drawn from a container, the third column is the number of the month that the day was in, the fourth column is the name of the month, and the fifth column is the day of the month. Thus, those born on September 14th (in the 9th month and the 258th day of the year) was the first to report to the draft board. The question that you are to try and answer is whether or not it appears that the method used to randomize the birthdays was fair. That is, did each birthday have the same chance of being selected. There are a large number of ways to analyze this data, but your assignment is to use either SAS or R to produce an easily explainable graph that shows if the draft was fair or not, and a short explanation to accompany it. If it was not fair your graph and explanation should show in what way it was unfair. Be sure to include a copy of any code you used to generate the output. 2) The web-page http://www.stat.sc.edu/~habing/courses/data/bballtest.txt contains the baseball data we discussed in class. Use whichever package (SAS or R) that you did not use for the first problem to construct a histogram of the number of homeruns hit in 1986 (HR86) by players in the national league (League=N). |