Bayesian Network Construction and Genotype-Phenotype Inference Using GWAS Statistics

Document Type

Article - Abstract Only

Publication Date



Bayes methods, Genetics, Privacy, Diseases, Noise measurement, Frequency control, Bayesian networks, Genome wide association study, Inference, Independence of causal infulence


Genome-wide association studies (GWASs) have received an increasing attention to understand how genetic variation affects different human traits. In this paper, we study whether and to what extend exploiting the GWAS statistics can be used for inferring private information about a human individual. We first provide a method to construct a three-layered Bayesian network explicitly revealing the conditional dependency between single-nucleotide polymorphisms (SNPs) and traits from public GWAS catalog. The key challenge in building a Bayesian network from GWAS statistics is the specification of the conditional probability table of a variable with multiple parent variables. We employ the models of independence of causal influences which assume that the causal mechanism of each parent variable is mutually independent. We then formulate three inference problems based on the dependency relationship captured in the Bayesian network, namely trait inference given SNP genotype, genotype inference given trait, and trait inference given known traits, and develop efficient formulas and algorithms. Different from previous work, the possible target of these inference problems we study may be any individual, not limited to GWAS participants. Empirical evaluations show the effectiveness of our proposed methods. In summary, our work implies that meaningful information can be inferred from modeling GWAS statistics, and appropriate privacy protection mechanisms need to be developed to protect genetic privacy not only of GWAS participants but also regular individuals.


Principal Investigator: Xintao Wu

Acknowledgements:This paper is a significant extension of the 4-page conference paper [50]. This work is supported in part by U.S. National Institute of Health (1R01GM103309) to L. Zhang, Q. Pan and X. Wu, US National Science Foundation (DGE-1523115 and IIS 1502273 to Q. Pan and X. Wu, and US National Science Foundation (DGE-1523154 and IIS-1502172) to X. Shi.

This document is currently not available here.