Date of Graduation

5-2018

Document Type

Thesis

Degree Name

Master of Science in Computer Science (MS)

Degree Level

Graduate

Department

Computer Science & Computer Engineering

Advisor/Mentor

Wu, Xintao

Committee Member

Li, Wing Ning

Second Committee Member

Li, Qinghua

Keywords

Bayesian Network; GWAS; GWAS Catalog; STIP

Abstract

Genome-wide association studies (GWASs) have received an increasing attention to understand genotype-phenotype relationships. The Bayesian network has been proposed as a powerful tool for modeling single-nucleotide polymorphism (SNP)-trait associations due to its advantage in addressing the high computational complex and high dimensional problems. Most current works learn the interactions among genotypes and phenotypes from the raw genotype data. However, due to the privacy issue, genotype information is sensitive and should be handled by complying with specific restrictions. In this work, we aim to build Bayesian networks from publicly released GWAS statistics to explicitly reveal the conditional dependency between SNPs and traits.

First, we focus on building a Bayesian network for modeling the SNP-categorical trait relationships. We construct a three-layered Bayesian network explicitly revealing the conditional dependency between SNPs and categorical traits from GWAS statistics. We then formulate inference problems based on the dependency relationship captured in the Bayesian network. Empirical evaluations show the effectiveness of our methods.

Second, we focus on modeling the SNP-quantitative trait relationships. Existing methods in the literature can only deal with categorical traits. We address this limitation by leveraging the Conditional Linear Gaussian (CLG) Bayesian network, which can handle a mixture of discrete and continuous variables. A two-layered CLG Bayesian network is built where the SNPs are represented as discrete variables in one layer and quantitative traits are represented as continuous variables in another layer. Efficient inference methods are then derived based on the constructed network. The experimental results demonstrate the effectiveness of our methods.

Finally, we present STIP, a web-based SNP-trait inference platform capable of a variety of inference tasks, such as trait inference given SNP genotypes and genotype inference given traits. The current version of STIP provides three services which are SNP-trait inference, Top-k trait prediction and GWAS catalog exploration.

Share

COinS