Date of Graduation

8-2018

Document Type

Thesis

Degree Name

Master of Science in Statistics and Analytics (MS)

Degree Level

Graduate

Department

Mathematical Sciences

Advisor/Mentor

Qingyang Zhang

Committee Member

Jyotishka Datta

Second Committee Member

Avishek Chakraborty

Keywords

Conditional Mutual Information, Interaction Effects, Numerous Screening, Partial Correlation, Pearson’s Correlation Coefficient

Abstract

Numerous screening techniques have been developed in recent years for genome-wide association studies (GWASs) (Moore et al., 2010). In this thesis, a novel model-free screening method was developed and validated by an extensive simulation study. Many screening methods were mainly focused on main effects, while very few studies considered the models containing both main effects and interaction effects. In this work, the interaction effects were fully considered and three different methods (Pearson’s Correlation Coefficient, Partial Correlation, and Conditional Mutual Information) were tested and their prediction accuracies were compared.

Pearson’s Correlation Coefficient method, which is a direct interaction screening (DIS) procedure, tended to incorrectly screen interaction terms as it omits the relationship between main effects and interaction effects. To this end, we proposed to use two new interaction screening procedures, namely Partial Correlation Interaction Screening (PCIS) method and Conditional Mutual Information Interaction Screening (CMIIS) method. The Partial Correlation (PC) could measure association between two variables, while adjusting the effect of one or more extra variables. The Conditional Mutual Information (CMI) is the expected value of the mutual information (MI) of two random variables given the value of a third (Wyner, 1978), while MI is a measure of general dependence. Finally, an illustration and performance comparison of the three screening procedures by simulation studies were made and these procedures were applied to real gene data.

Share

COinS