Date of Graduation

5-2016

Document Type

Thesis

Degree Name

Master of Science in Statistics and Analytics (MS)

Degree Level

Graduate

Department

Graduate School

Advisor

Qingyang Zhang

Committee Member

Avishek Chakraborty

Second Committee Member

Giovanni Petris

Abstract

Rapid advance in sequencing technology has led to genome-wide analysis of genetic and epigenetic features simultaneously, making it possible to understand the biological mechanisms underlying cancer initiation and progression. However, how to identify important prognostic features poses a great challenge for both statistical modeling and computing. In this thesis, a network-based approach is applied to the Cancer Genome Atlas (TCGA) ovarian cancer data to identify important genes related to the overall survival of ovarian cancer patients. In the first step, a stepwise correlation-based selector is used to reduce the dimensionality of TCGA data, by filtering out a large number of unrelated genes. Second, we employ the graphical lasso to construct a sparse gene-gene co-expression network. The undirected network allows us to classify genes into groups based on gene-gene interaction. We fit a cox proportional hazard model with a sparse group lasso penalty for further variable selection and identify 232 genes, which are prognostic for ovarian cancer survival. Of these 232 genes, many were reported to be associated with cancer initiation or progression in the literature. The Kaplan-Meier curves based on the identified genes show clear separation among different groups of patients based on different gene expression levels.

Share

COinS