|
April 18, 2003
33rd Annual SCASA Meeting
Student Paper Abstracts
- Jamie Burnham(Department of Mathematical Sciences, Clemson University) The Effect of Alcohol and Stress on Pregnancy Outcomes of Women Undergoing In Vitro Fertilization
The objective of this study is to determine the effect of alcohol use, stress, and especially how one copes with stress on In Vitro Fertilization pregnancy rates. Data was obtained on 198 women who entered the IVF program at the Reproductive Endocrinology and Infertility Division of Greenville Hospital System in Greenville, South Carolina during the period of January 2001 to December 2002. Upon entering the program the women were requested to fill out a questionnaire concerning basic demographic information and various aspects of their lifestyle.
- Maureen O'Gorman Petkewich (Department of Statistics, University of South Carolina) Multiple Comparisons for Additional Risk with Nonquantal Data
In this paper, methodology is proposed for making simultaneous inferences in quantitative risk analysis. Application is intended for risk assessment studies where human or animal data are used to set safe low-dose levels of a toxic agent, but where study information is limited to high dose levels of the agent. Methods are derived for calculating a finite set of simultaneous upper confidence limits on additional risk for endpoints measured on a continuous scale. The setting adopts a quadratic model where xi represents a recorded value of the non-stochastic dose variable, Y(xi) represents a measured response to the dose level, and Y(xi) is independently distributed as N(b0 + b1xi +b2xi2,s2) for all i = 1,. . .,n. Construction of the upper bounds begins with a pointwise confidence statement on additional risk and then employs a correction for multiplicity. Three adjustments for multiplicity are considered: a Bonferroni correction, a Sidak adjustment, and a modified Scheffi type adjustment. Monte Carlo evaluations are conducted to study and compare the characteristics of the upper bounds under these settings.
- Robin (Buz) Kloot (Department of Environmental Health Sciences, University of South Carolina) Predicting Time to Develop Comprehensive Nutrient Management Plans
A Comprehensive Nutrient Management Plan (CNMP) is a multi-year plan that addresses the environmental and operational aspects of livestock production facilities. The U.S. Environmental Protection Agency and the U.S. Department of Agriculture jointly formulated CNMP standards to address concerns associated with livestock production. Thousands of CNMPs will need to be developed over the next seven years to meet voluntary and regulatory requirements. Many will be developed by private consultants, called Technical Service Providers (TSPs).
One such TSP organization conducted a pilot series of 38 site-specific CNMPs for hog operations in seven states between June and December 2002. It was found that the time taken for the TSP to develop a CNMP ranged from between 40 and 260 hours, depending on operation size, complexity and availability of relevant information. The TSP asked the Earth Sciences and Resources Institute (ESRI-USC) to develop a model to predict hours required to complete a CNMP as a tool to cost future jobs. A multiple linear regression (MLR) approach was used on data from 28 operations, and a surprising number of seemingly good candidate MLR models emerged from the exercise. This presentation discusses the modeling process used, the models, their strengths and weaknesses and their validation with 10 data sets acquired after the models were constricted. Finally, consideration is given to the TSP's perceptions of the tools, process and results, and how educating the TSP in judicious use of the models formed a vital part in the entire process.
- Daniela Nitcheva (Department Of Statistics, University of South Carolina) Multiple Comparisons in Low-Dose Risk Assessment With Quantal Response Data
A primary objective in quantitative risk assessment is characterization of the severity and likelihood of damage to humans or to the environment caused by a hazardous agent. In many cases data are not available at low doses of the agent. In such cases inferences at low doses must be based on the high-dose data. One model which allows such extrapolation for quantal response data is the multistage model. Under this model several methods for estimating upper confidence limits on extra risk are presented. Similar methods are proposed for estimating lower confidence limits on benchmark dose. Bonferroni corrections are applied to adjust the limits for multiplicity. Monte Carlo evaluations explore characteristics of the proposed limits.
- Wonsuk Yoo (Department of Biometry & Epidemiology, Medical University of South Carolina) Simulation of a Bayesian Hierarchical Changepoint Model for Longitudinal Biomarkers
A longitudinal biomarker can be monitored over time for changes that may be associated with changes in disease status. This research is a simulation study to verify whether a decision based on longitudinal biomarkers can be more effective in prospective diagnostic cancer detection compared with that on single or average of annual rates. We use a segmented linear mixed effects model with full Bayesian framework in which the changepoint representing initiation of disease. This model assumes that all subjects will have a disease eventually. In this study we show how a diagnostic rule depending on the posterior probability for the chagepoint can be used for detecting disease onset for a specific subject with longitudinal biomarker readings. For that, we generated samples from the Bayesian hierarchical changepoint model, had a prospective estimation of changepoints of a subject using Gibbs sampler, developed a diagnostic rule depending on the posterior probability method, and finally compared this rule with a traditional threshold rule depending on a single observation or annual rate using ROC methodology.
- Yang An (Department of Mathematical Sciences, Clemson University)Evaluation of Plus/Minus grades in Clemson University
Clemson University is currently using plus/minus grading for a planned two-year long trial period beginning in the fall semester of 2002. A main objective of the plus/minus grading trial is to assess how undergraduates Grade Point Ratios (GPR's) will be affected relative to the current standard grading scale. Effects on GPR's of individual students and on groups of students (e.g., those with LIFE scholarships) are of interest. Differences in GPR's between plus/minus grading and standard grading could have ramifications for candidates for graduation, for scholarship retention, for varsity athletics eligibility, etc.
The presentation will report these effects in detail based on all grades for undergraduates reported for the Fall,2002, the first semester of the planed two-year long trial.
- Alexandru Petrisor (with Sorin Cheval) (Department of Environmental Health Sciences, University of South Carolina) Predicting Average Monthly Temperatures and Precipitations Using the Geostatistical Analyst
Climatology strives to explain temporal and spatial distributions, and the dynamics of data that describe the state of the atmosphere. Identifying and assessing the complex relationships between as many as possible variables fosters a better understanding of the processes driving climatic patterns. The temporal dimension of phenomena is one of the most important factors that bias the scientific approach, therefore the integration, analysis, and visualization of spatial data that has a significant temporal component has become a priority in environmental sciences. Geographical Information Systems (GIS) can represent, correlate and assess the relationships between large amounts of data, referenced to a known spatial and temporal framework. Hence, besides being a powerful cartographic tool, a GIS can store, retrieve and combine data to create new representation of geographic space, provides tools for spatial analysis and performs.
Monthly precipitations and temperature data were collected in Romania at five stations during 1961-2000. Data were grouped in arrays having the year on the vertical axis and the month on the horizontal one. Several interpolation methods have been used: spline fitting, ordinary kriging, inverse distance weighting, and filtering. Out of all interpolation methods, given its mathematical foundation, kriging yielded the most significant and sound results. Our study indicated that temperatures follow an easily predictable pattern, with slight variations corresponding to lengths of the cold or the warm season. Precipitation clusters appeared more relevant. In our setting, they pointed out dry and wet spells, suggesting the occurrence of hazard events such as floods or droughts. This was particularly obvious in naturally dry/wet regions. Our method emphasized the temporal variability of the phenomena. Thus, climate changes, one of the most debated issues, might be identified and qualitatively predicted.
- Robyn Minor Smith (Department of Mathematical Sciences, Clemson University) Closing the Gaps: An Inquiry Approach
This paper analyzes data from a study that tested the idea whether students in a more applied inquiry curriculum learn more of the presented material and perform better on standardized tests than students who are taught in the more traditional manner with lectures and worksheets. The purpose of this paper is to find out if inquiry, or hands-on learning, could be more effective than traditional teaching methods and if it could lessen the educational gaps between different student groups when implemented correctly.
- Yuping Wu (Department of Statistics, University of South Carolina) Adding Expressions to WebStat 3.0
WebStat is a Web-based statistical software package developed to provide a new approach for statistical analysis over the Web. West and Ogden originally released WebStat 1.0, which runs in the form of Java applet in 1997. The first two versions of the software have been developed to offer basic statistical and graphical routines that are encountered in an introductory statistics course. With this educational focus, WebStat has been used as the primary data analysis tool in a number of statistics courses. In addition, WebStat has also become a popular tool for disseminating data across the World Wide Web. The latest version, 3.0, contains numerous new features that extend the capabilities of the software several orders of magnitude beyond previous versions. In this paper, the new adding expressions feature, which offers command driven capability, is described and illustrated.
- Parul Bhargava (with John Spurrier) (Department of Statistics, University of South Carolina) Exact One-Sided Confidence Bounds Comparing Two Regression Lines with a Control Regression Line
There have been several recent papers discussing multiple comparisons of treatments based on two-sided comparisons of simple linear regression models. In this paper we present one-sided simultaneous confidence bounds for comparing simple linear regression lines for two treatments with a simple linear regression line for the control. The assumptions of iid normal errors, unrestricted predictor variable and equal design matrices for the two treatments and the control have been made. The method is based on a pivotal quantity that can be expressed as a function of 4-dimensional multivariate-t variables. We present tables of probability points for various sample sizes and levels of significance.
- Ling Chen (Department of Epidemiology and Biostatistics, University of South Carolina) Multiple Imputation on Nonresponses in Ordinal Survey Data
Missing data plagues almost all surveys. The present study used simulation to compare the performance of the two methods handling missing covariate information in ordinal survey data --- complete case analysis (CC) and MVN-based multiple imputation (MI) under simulated scenarios with various missing-data mechanisms (MCAR, MAR and NI) and missingness rates of covariates in multivariate regression analysis. The performance of each method was addressed by comparing the absolute value of bias and standard error of regression estimate for each covariate to that from a Complete Standard Dataset. Our results indicated that the present MVN-based MI did not perform as well as CC to generate unbiased regression estimates regardless of missing-data mechanisms and missingness rates of covariates in the ordinal survey data. The underperformance of MI resulted mainly from the highly skewed distributions of the ordinal covariates as well as the issue of rounding the imputed values. We suggest applied researchers be reasonably confident in utilizing complete case analysis to generate unbiased regression estimates even in the presence of large proportions of missing data under the assumption of MCAR. For the missing covariates with highly skewed distribution in categorical data, MVN-based MI is not expected to be superior to complete case analysis in generating unbiased regression coefficients.
- Xijiang Miao (with Peter Waddell) (Department of Computer Science and Engineering, University of South Carolina) Inferring the Accuracy of Evolutionary Tree Reconstruction
Inferring evolutionary or phylogenetic trees is at the core of modern biology. One of the questions that naturally arises is how robust an inferred tree is to stochastic fluctuations in the data. Assuming independent evolution of sites in the DNA, aligned DNA sequence data may be compressed to counts of the frequency with which "site patterns" (e.g. species 1, nucleotide A, species 2, A, species 3, A, species 4, C) occur. This is a multinomial distribution. Felsenstein, Efron and others have studied the use of the bootstrap in such situations (i.e. bootstrap the sampled data and repeat the whole procedure to infer a tree, then compare the trees obtained). Others have suggested Bayesian posterior probability based upon sampling trees from a stationary MCMC chain. We show the bootstrap is often a conservative estimate of the true accuracy of tree estimation under the model, while the Bayesian posterior probabilities are nearly unbiased except when using very small amounts of data. A new form of the bootstrap is seen to correct for the bias of the bootstrap. This is very appealing, given that when the fit of data to model breaks down, but sites remain independent, the bootstrap gives better estimates of sampling error than the likelihood surface of an erroneous model (which is the basis of Bayesian posteriors).
- Russell S. Stocker IV (with Edsel A. Pena) (Department of Statistics, University of South Carolina) A General Counting Process Model for Recurrent Event Data
A general class of models for recurrent event data given by Pena and Hollander (2002) is considered. The class of models includes many of the models found throughout the literature. These include the imperfect repair model of Brown and Proschan (1983), the general Cox proportional hazards model, and the general repair model of Dorado, Hollander, and Sethuraman (1997). The model is based on using a flexible multiplicative intensity process. It allows for the inclusion of the effect of repeated failures by perturbing the baseline intensity process with an "effective age" process. The effect of interventions due to multiple failures is included by use of a function included multiplicatively in the intensity process. The influence of outside factors through the use of a link function with covariates is included. The inclusion of a frailty component is allowed for modeling unobservable random effects. The model is considered under a fully parametric specification where a frailty is assumed not to exist. Estimators are obtained by the development of a partial likelihood function using the results of Jacod (1975). The asymptotics and consistency for the estimators are found by transforming the model from calendar time into a gap time formulation. The estimators are found to follow asymptotically a Gaussian process with a certain quadratic variation process under certain regularity conditions. The performances of the estimators are also accessed through computer simulation.
|