Date of Graduation
5-2026
Document Type
Thesis
Degree Name
Bachelor of Science in Data Science
Degree Level
Undergraduate
Department
Data Science
Advisor/Mentor
Karl Schubert
Committee Member
Jared Wolf
Second Committee Member
Jana Gastineau
Abstract
Accurate estimation of transit travel times and deramp durations is critical for operational planning, resource allocation, and customer service quality in intermodal freight rail. This thesis presents a progressive ETA estimation framework using distributional gradient boosted decision trees (CatBoost) to predict intermodal rail container availability times for J.B. Hunt Transport Services. The framework employs a two-stage cascading pipeline: a departure-time stage that fires when the train departs (combining transit and departure-time deramp predictions) and an arrival-time stage that fires when the train reaches the destination ramp (substituting actual transit time into the ETA). Each stage trains separate PREMIUM and STANDARD service segment models. All models were evaluated on a one-week out-of-time (OOT) holdout period, with a fair J.B. Hunt baseline reconstructed from the 91 days (~3 months) immediately preceding the evaluation window.
On the OOT holdout, the departure-time combined pipeline (Stage 1) achieved a 59.6% window hit rate versus JBH’s 60.0%, with a median absolute error of 223.5 minutes versus 224.0 (effectively tied at departure). When the pipeline updates at train arrival (Stage 2), the model’s combined median absolute error drops from 223.5 to 98.0 minutes, a 56% reduction. On the fair isolated deramp comparison at arrival, where both sides estimate deramp duration with no information advantage, the model achieved 98.8 MdAE versus JBH’s 101.0 (+2.2% model wins on MdAE), while JBH held a slight window hit rate edge (74.0% vs 73.4%). Within the transit model, the GBDT advantage concentrates in sparse-data regimes: lanes with fewer than 5 historical trains (+39.6%) and JBH Transit Tier 4 (+30.3%), where the lookup table’s simple median aggregation breaks down. The framework predicts conditional quantiles (median and 90th percentile) alongside point estimates, approximating predictive distributions. The main contribution of this work is filling the gaps where the lookup-table baseline degrades: sparse lanes, broad fallback tiers, and low-volume buckets where simple median aggregation breaks down.
Keywords
Intermodal; rail; transit; deramp; GBDT; gradient boosted decision tree
Citation
Ham, J. K. (2026). Distributional Gradient Boosted Decision Trees for Freight Rail ETA Estimation: A Progressive Modeling Framework for Intermodal Transit and Deramp Duration Prediction. Data Science Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/dtscuht/38