Date of Graduation

5-2026

Document Type

Thesis

Degree Name

Bachelor of Science in Data Science

Degree Level

Undergraduate

Department

Data Science

Advisor/Mentor

Karl Schubert

Committee Member

Jared Wolf

Second Committee Member

Jana Gastineau

Abstract

Accurate estimation of transit travel times and deramp durations is critical for operational planning, resource allocation, and customer service quality in intermodal freight rail. This thesis presents a progressive ETA estimation framework using distributional gradient boosted decision trees (CatBoost) to predict intermodal rail container availability times for J.B. Hunt Transport Services. The framework employs a two-stage cascading pipeline: a departure-time stage that fires when the train departs (combining transit and departure-time deramp predictions) and an arrival-time stage that fires when the train reaches the destination ramp (substituting actual transit time into the ETA). Each stage trains separate PREMIUM and STANDARD service segment models. All models were evaluated on a one-week out-of-time (OOT) holdout period, with a fair J.B. Hunt baseline reconstructed from the 91 days (~3 months) immediately preceding the evaluation window.

On the OOT holdout, the departure-time combined pipeline (Stage 1) achieved a 59.6% window hit rate versus JBH’s 60.0%, with a median absolute error of 223.5 minutes versus 224.0 (effectively tied at departure). When the pipeline updates at train arrival (Stage 2), the model’s combined median absolute error drops from 223.5 to 98.0 minutes, a 56% reduction. On the fair isolated deramp comparison at arrival, where both sides estimate deramp duration with no information advantage, the model achieved 98.8 MdAE versus JBH’s 101.0 (+2.2% model wins on MdAE), while JBH held a slight window hit rate edge (74.0% vs 73.4%). Within the transit model, the GBDT advantage concentrates in sparse-data regimes: lanes with fewer than 5 historical trains (+39.6%) and JBH Transit Tier 4 (+30.3%), where the lookup table’s simple median aggregation breaks down. The framework predicts conditional quantiles (median and 90th percentile) alongside point estimates, approximating predictive distributions. The main contribution of this work is filling the gaps where the lookup-table baseline degrades: sparse lanes, broad fallback tiers, and low-volume buckets where simple median aggregation breaks down.

Keywords

Intermodal; rail; transit; deramp; GBDT; gradient boosted decision tree

Available for download on Saturday, May 05, 2029

Included in

Data Science Commons

Share

COinS