Date of Graduation
12-2021
Document Type
Thesis
Degree Name
Master of Science in Geology (MS)
Degree Level
Graduate
Department
Geosciences
Advisor/Mentor
Sharman, Glenn R.
Committee Member
Szymanski, Eugene
Second Committee Member
Covington, Matthew D.
Third Committee Member
Huang, Xiao
Keywords
Petrography; quartz; feldspar; lithic grains
Abstract
Petrography has long been used as a tool to decipher the sedimentary provenance of sand and sandstone from the relative proportions of framework grain types. Petrographers have also related the proportions of quartz (Q), feldspar (F), and lithic (L) grains to the processes that form and modify sediments within sediment routing systems. This past work has shown that factors including source lithology, climate, transport history, and tectonism work in concert to modify the framework mineralogy of sand. However, there is a lack of a quantitative understanding of the interactions and feedbacks between these factors and how they modify sand mineralogy. This research aims to establish a predictive framework that constrains the relationship between sand framework grain mineralogy and the factors that influence it, including bedrock lithology, topography, and climate. Specifically, this study asks, “to what degree can the final modal composition of sand be predicted if the boundary conditions that generate sediments are known?”.
This question is investigated by analyzing a globally extensive modal point count dataset of 3,522 Pleistocene to modern sand samples from 51 published sources. A petrographic data model was created to standardize 287 reported petrographic labels to a final list of 54 labels. An inline series of random forest (RF) machine learning algorithms were trained on a subset of 3,208 fluvial and marine samples whose boundary conditions are known with a high degree of confidence. Data for precipitation, temperature, elevation, slope, basin area, and seven generalized source lithologies were extracted from sample catchments and used to train 100 RF meta-estimators that predict the logarithms F:Q and L:Q ratios as well as eight Q-F-L subcompositions, resulting in R2 scores of 0.654 ± 0.031 (1-sigma) and 0.706 ± 0.023 (1-sigma) for ln(F/Q) and ln(L/Q) models, respectively. Mean Q-F-L prediction error within one standard deviation is 2.6% ± 15% for Q, -1.1 ± 9.4% for F, and -1.5% ± 15.4% for L.
The Global Prediction of Sand Mineralogy (GloPrSM) model was generated by applying the 100 RF meta-estimators to a global dataset of fluvial watersheds (mean area of ~1,500 km2). The resulting Q-F-L prediction includes an estimate of spatial uncertainty based upon variability in the 100 predictions. In general, the GloPrSM model predicts quartz enrichment in low latitudes (35°N to 35°S), feldspar enrichment near plutonic and metamorphic crystalline terranes in middle to high latitudes, and lithic enrichment near active margins and flood basalts. Low model confidence is exhibited in catchments draining large igneous provinces, in sedimentary terranes in middle to high latitudes, and in orogenic settings. Feature importance algorithms reveal that slope, temperature, metamorphic source abundance, and felsic to intermediate plutonic source abundance are the most important predictors of Q-F-L composition. In addition, partial dependence analysis suggests temperatures higher than 15 ºC and large drainage areas favor quartz enrichment, while steeply sloping environments favor lithic enrichment. The GloPrSM model represents the first, global-scale estimate of sand mineralogical proportions, and illustrates that the spatial distribution of Q-F-L at Earth’s surface can be predicted from the first-order factors that generate sediments.
Citation
Johnson, I. (2021). Machine Learning Applied to a Modern-Pleistocene Petrographic Dataset: The Global Prediction of Sand Mineralogy (GloPrSM) Model. Graduate Theses and Dissertations Retrieved from https://scholarworks.uark.edu/etd/4300
Supplemental_Fig_A.pdf (865 kB)
Supplemental_Fig_B.pdf (6430 kB)
Supplemental_Fig_C.pdf (34559 kB)
Supplemental_Fig_D.pdf (1430 kB)
Supplemental_Fig_E.pdf (6882 kB)
Supplemental_Fig_F.pdf (10718 kB)
Table A - Sources.xlsx (12 kB)
Table B - Data Model.xlsx (11 kB)
Table C - Samples.xlsx (444 kB)
Table D - R2 Scores from all models.xlsx (10 kB)
Table E - Abundance of Zero Values in the Database.xlsx (9 kB)
Table F - Recalculated Parameters.xlsx (11 kB)
Table G - Recalculated Data.xlsx (674 kB)
Table H - References.xlsx (32 kB)