Data Science Undergraduate Honors Theses

Distributional Gradient Boosted Decision Trees for Freight Rail ETA Estimation: A Progressive Modeling Framework for Intermodal Transit and Deramp Duration Prediction

Jaxon K. Ham, University of Arkansas, FayettevilleFollow

Date of Graduation

5-2026

Document Type

Thesis

Degree Name

Bachelor of Science in Data Science

Degree Level

Undergraduate

Department

Data Science

Advisor/Mentor

Karl Schubert

Committee Member

Jared Wolf

Second Committee Member

Jana Gastineau

Abstract

Accurate estimation of transit travel times and deramp durations is critical for operational planning, resource allocation, and customer service quality in intermodal freight rail. This thesis presents a progressive ETA estimation framework using distributional gradient boosted decision trees (CatBoost) to predict intermodal rail container availability times for J.B. Hunt Transport Services. The framework employs a two-stage cascading pipeline: a departure-time stage that fires when the train departs (combining transit and departure-time deramp predictions) and an arrival-time stage that fires when the train reaches the destination ramp (substituting actual transit time into the ETA). Each stage trains separate PREMIUM and STANDARD service segment models. All models were evaluated on a one-week out-of-time (OOT) holdout period, with a fair J.B. Hunt baseline reconstructed from the 91 days (~3 months) immediately preceding the evaluation window.

On the OOT holdout, the departure-time combined pipeline (Stage 1) achieved a 59.6% window hit rate versus JBH’s 60.0%, with a median absolute error of 223.5 minutes versus 224.0 (effectively tied at departure). When the pipeline updates at train arrival (Stage 2), the model’s combined median absolute error drops from 223.5 to 98.0 minutes, a 56% reduction. On the fair isolated deramp comparison at arrival, where both sides estimate deramp duration with no information advantage, the model achieved 98.8 MdAE versus JBH’s 101.0 (+2.2% model wins on MdAE), while JBH held a slight window hit rate edge (74.0% vs 73.4%). Within the transit model, the GBDT advantage concentrates in sparse-data regimes: lanes with fewer than 5 historical trains (+39.6%) and JBH Transit Tier 4 (+30.3%), where the lookup table’s simple median aggregation breaks down. The framework predicts conditional quantiles (median and 90^th percentile) alongside point estimates, approximating predictive distributions. The main contribution of this work is filling the gaps where the lookup-table baseline degrades: sparse lanes, broad fallback tiers, and low-volume buckets where simple median aggregation breaks down.

Keywords

Intermodal; rail; transit; deramp; GBDT; gradient boosted decision tree

Citation

Ham, J. K. (2026). Distributional Gradient Boosted Decision Trees for Freight Rail ETA Estimation: A Progressive Modeling Framework for Intermodal Transit and Deramp Duration Prediction. Data Science Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/dtscuht/38

Data Science Undergraduate Honors Theses

Distributional Gradient Boosted Decision Trees for Freight Rail ETA Estimation: A Progressive Modeling Framework for Intermodal Transit and Deramp Duration Prediction

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Second Committee Member

Abstract

Keywords

Citation

Included in

Search

Links

Browse

Contact Us

Data Science Undergraduate Honors Theses

Distributional Gradient Boosted Decision Trees for Freight Rail ETA Estimation: A Progressive Modeling Framework for Intermodal Transit and Deramp Duration Prediction

Author

Date of Graduation

Document Type

Degree Name

Degree Level

Department

Advisor/Mentor

Committee Member

Second Committee Member

Abstract

Keywords

Citation

Included in

Share

Search

Links

Browse

Contact Us