Date of Graduation


Document Type


Degree Name

Master of Arts in Geography (MA)

Degree Level





Jason A. Tullis

Committee Member

Jackson Cothren

Second Committee Member

Xuan Shi


Biological sciences, Earth sciences, Biomass, Lidar, Machine learning


Light detection and ranging (lidar) has been applied in various forest applications, such as to retrieve forest structural information, to build statistical models for identification of tree species, and to monitor forest growth. However, despite significant progress in these areas, the choice of regression approach and parameter tuning remains an ongoing critical question. This study focused on choosing the right spatial generalization level to transform lidar point clouds to 2D images which can be further processed by mature image processing and pattern recognition approaches. It also compared the prediction ability of popular machine learning algorithms applied to aboveground forest biomass estimation. A neighborhood technique was employed to calculate lidar-derived height metrics which were used as predictors to estimate forest total biomass at the image object (or segment) level. Three machine learning algorithms were tested to explore the relationship between the lidar-derived height metrics and biomass observed in situ. The height metrics were calculated as percentile heights and canopy coverage based on the lidar points falling within certain spatial extents (neighborhoods). The effect of neighborhood size was examined by developing regression models using Support Vector Machine (SVM), Cubist, and Random Forest on images created by applying 0.5, 2.5, 5, 10, and 15-meter neighborhood. Experiments were conducted in two study sites, the Ozark Mountains of Arkansas and the Trinity River Basin of Texas, with significantly different landscapes, hardwood tree species, and lidar point distributions. Regression models were constructed and evaluated with 10-fold cross validation. Results showed that optimal neighborhood configurations depend on the lidar data and regression techniques that are applied. The optimal model among all neighborhoods and algorithms achieved training accuracies of 0.988 and 0.990, and validation accuracies of 0.902 and 0.853 (adjusted R2) at the two study sites respectively.