Date of Graduation

12-2020

Document Type

Thesis

Degree Name

Master of Science in Statistics and Analytics (MS)

Degree Level

Graduate

Department

Statistics and Analytics

Advisor/Mentor

Zhang, Qingyang

Committee Member

Arnold, Mark E.

Second Committee Member

Datta, Jyotishka

Keywords

Conditional distance correlation; Conditional mutual information; Ovarian cancer; Pearson's partial correlation

Abstract

Over the past years, efforts have been devoted to the genome-wide analysis of genetic and epigenetic profiles to better understand the underlying biological mechanisms of complex diseases such as cancer. It is of great importance to unravel the complex dependence structure between biological factors, and many conditional dependence tests have been developed to meet this need. The traditional partial correlation method can only capture the linear partial correlation, but not the nonlinear correlation. To overcome this limitation, we propose to use the innovative conditional distance correlation (CDC), which measures the conditional dependence between random vectors and detect nonlinear relations. In this thesis, the CDC measure is applied to the rich Cancer Genome Atlas (TCGA) ovarian cancer data, and we identify a list of interesting genes with nonlinear features. We integrate three important types of molecular features including gene expression, DNA methylation and copy number variation, and implement the partial correlation test and CDC test to infer the relations between the three measurements for each gene. Out of 196 candidate oncogenes and tumor suppressors, we identify 19 genes in which two of the molecular features are nonlinearly dependent given the third variable. Of these 19 genes, many were reported to be associated with ovarian cancer or breast cancer in the literature. Our findings could shed new light on the biological relations between the three important molecular aspects.

This thesis is structured as follows: we begin with a brief introduction to ovarian cancer, TCGA data, the three molecular measurements, and two testing methods in Chapter 1. In the second chapter, we review different statistical methods including Pearson’s partial correlation and conditional distance correlation. In Chapter 3, we conduct an extensive simulation study to compare the empirical performance of different methods. In Chapter 4, we apply the new method to the TCGA ovarian data. We conclude the thesis with future directions in Chapter 5.

Share

COinS