Author ORCID Identifier:

https://orcid.org/0009-0008-3670-3492

Date of Graduation

5-2026

Document Type

Thesis

Degree Name

Master of Science in Statistics and Analytics (MS)

Degree Level

Graduate

Department

Statistics and Analytics

Advisor/Mentor

Majumder, Reetam

Committee Member

Chakraborty, Avishek

Second Committee Member

Petris, Giovanni

Third Committee Member

Zhang, Qingyang

Keywords

non-linear data; Semi-Parametric Quantile Regression (SPQR); variable importance metrics

Abstract

Classical statistical methods focus on explainability and inferential power. Machine learning and deep learning can handle non-linear, high-dimensional data better than traditional methods. In modeling, a clear understanding and interpretation are essential to decision-making. Recent work in quantile regression and extreme modeling has begun to use deep learning due to its performance on high-dimensional, non-linear data. Semi-Parametric Quantile Regression (SPQR) is a nonparametric spline-based approach to quantile regression that estimates the conditional PDF and CDF of the response. Semi-Parametric Quantile Regression for Extremes (SPQRx) is a recent extension of SPQR that provides two features: out-of-sample estimation and accurate extreme-tailed estimation. In this thesis, we use these methods to test global and local variable importance metrics for black-box models and assess their ability to adequately explain the models' behavior. Local methods like LIME and Kernel Shapley focus on an individual data point, as it is assumed to be easier and more interpretable to explain the local than the global behavior. Work in the SPQR and SPQRx spheres has not used these variable importance metrics. In this thesis, we introduce guidelines for variable importance metrics in SPQR and SPQRx, and we develop an R package for SPQRx based on the original. By performing three simulation studies and analyzing two datasets, one of which is a new application, we found that the results of QALE and Kernel Shapley align well, but LIME underperforms when applied to an actual dataset.

Share

COinS