Date of Graduation
Master of Science in Statistics and Analytics (MS)
Second Committee Member
Rapid advance in sequencing technology has led to genome-wide analysis of genetic and epigenetic features simultaneously, making it possible to understand the biological mechanisms underlying cancer initiation and progression. However, how to identify important prognostic features poses a great challenge for both statistical modeling and computing. In this thesis, a network-based approach is applied to the Cancer Genome Atlas (TCGA) ovarian cancer data to identify important genes related to the overall survival of ovarian cancer patients. In the first step, a stepwise correlation-based selector is used to reduce the dimensionality of TCGA data, by filtering out a large number of unrelated genes. Second, we employ the graphical lasso to construct a sparse gene-gene co-expression network. The undirected network allows us to classify genes into groups based on gene-gene interaction. We fit a cox proportional hazard model with a sparse group lasso penalty for further variable selection and identify 232 genes, which are prognostic for ovarian cancer survival. Of these 232 genes, many were reported to be associated with cancer initiation or progression in the literature. The Kaplan-Meier curves based on the identified genes show clear separation among different groups of patients based on different gene expression levels.
Mai, Kristi, "Identification of Biomarkers for the Overall Survival of Ovarian Cancer Patients" (2016). Theses and Dissertations. 1493.