Date of Graduation

5-2018

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Mathematics (PhD)

Degree Level

Graduate

Department

Mathematical Sciences

Advisor/Mentor

Mark Arnold

Committee Member

Avishek Chakraborty

Second Committee Member

Giovanni Petris

Third Committee Member

Qingyang Zhang

Keywords

Bayesian Analysis, CAR Prior, MCMC, Outliers Detection, Reversible Jump MCMC, Spatial Data

Abstract

This dissertation makes two important contributions to the development of Bayesian hierarchical models. The first contribution is focused on spatial modeling. Spatial data observed on a group of areal units is common in scientific applications. The usual hierarchical approach for modeling this kind of dataset is to introduce a spatial random effect with an autoregressive prior. However, the usual Markov chain Monte Carlo scheme for this hierarchical framework requires the spatial effects to be sampled from their full conditional posteriors one-by-one resulting in poor mixing. More importantly, it makes the model computationally inefficient for datasets with large number of units. In this dissertation, we propose a Bayesian approach that uses the spectral structure of the adjacency to construct a low-rank expansion for modeling spatial dependence. We develop a computationally efficient estimation scheme that adaptively selects the functions most important to capture the variation in response. Through simulation studies, we validate the computational efficiency as well as predictive accuracy of our method. Finally, we present an important real-world application of the proposed methodology on a massive plant abundance dataset from Cape Floristic Region in South Africa. The second contribution of this dissertation is a heavy tailed hierarchical regression to detect outliers. We aim to build a linear model that can allow for small as well as large magnitudes of residuals through observation-specific error distribution. t-distribution is specifically suited for that purpose as we can parametrically control its degrees of freedom (df) to tune the heaviness of its tail - large df values represent observations in normal range and small ones represents potential outliers with high error magnitudes. In a hierarchical structure, we can write t-distribution as a scale mixture of a Gaussian distribution so that the standard MCMC algorithm for Gaussian setting can still be used. Post-MCMC, the posterior mean of degrees of freedom for any observation acts as a measure of outlyingness of that observation. We implemented this method on a real dataset consisting of biometric records.

Share

COinS