Date of Graduation


Document Type


Degree Name

Master of Science in Statistics and Analytics (MS)

Degree Level



Statistics and Analytics


Qingyang Zhang

Committee Member

Jyotishka Datta

Second Committee Member

Yuchun Du


Distance Correlation, Gene Set Test, Multivariate Independent


Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the distance correlation test, simulation studies under different settings are conducted for a comprehensive comparison with the popular Hotelling’s T^2 test and rotation gene set test (ROAST). The three gene set tests are also applied to two real datasets for further comparisons. Based on our simulation studies and real data applications, it is found that the distance correlation test has overall better statistical performance than Hotelling’s T^2 test and ROAST test, especially for detecting the difference in variability.

This thesis begins with introductions to the problem of gene set testing, and then introduces the prevailing Hotelling’s T^2 test and ROAST test. Chapter 2 is a detailed review of the concepts and properties of distance correlation. The results from simulation studies and real data applications were summarized in Chapters 3 and 4 respectively. In Chapter 5, we conclude the thesis with some discussion and future perspectives.