Date of Graduation
12-2020
Document Type
Thesis
Degree Name
Master of Science in Statistics and Analytics (MS)
Degree Level
Graduate
Department
Statistics and Analytics
Advisor/Mentor
Zhang, Qingyang
Committee Member
Datta, Jyotishka
Second Committee Member
Du, Yuchun
Keywords
Distance Correlation; Gene Set Test; Multivariate Independent
Abstract
Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the distance correlation test, simulation studies under different settings are conducted for a comprehensive comparison with the popular Hotelling’s T^2 test and rotation gene set test (ROAST). The three gene set tests are also applied to two real datasets for further comparisons. Based on our simulation studies and real data applications, it is found that the distance correlation test has overall better statistical performance than Hotelling’s T^2 test and ROAST test, especially for detecting the difference in variability.
This thesis begins with introductions to the problem of gene set testing, and then introduces the prevailing Hotelling’s T^2 test and ROAST test. Chapter 2 is a detailed review of the concepts and properties of distance correlation. The results from simulation studies and real data applications were summarized in Chapters 3 and 4 respectively. In Chapter 5, we conclude the thesis with some discussion and future perspectives.
Citation
Su, S. (2020). Gene Set Testing by Distance Correlation. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/3931
Included in
Bioinformatics Commons, Biostatistics Commons, Computational Biology Commons, Microarrays Commons, Multivariate Analysis Commons, Statistical Models Commons