Date of Graduation


Document Type


Degree Name

Doctor of Philosophy in Engineering (PhD)

Degree Level



Industrial Engineering


Shengfan Zhang

Committee Member

W. Art Chaovalitwongse

Second Committee Member

Edward Pohl

Third Committee Member

Mahboubeh Madadi


Decision modeling, Machine learning, Multivariate time-series data, Optimization, Patient health data


Complex healthcare systems require efficient and effective data-driven decision making in various aspects. As patient data becomes more available, advanced statistical learning and machine learning techniques are applied to improve data-driven decision making. However, patient health data, including clinical trial data, medical records, and electronic health records, are associated with several challenges. Patient health data includes medical information of a patient that may includedemographics, information relating to their health or illness, medications and treatments, etc. They are a combination of static and time series variables, with many censoring and missingness in the data, and are irregularly sampled in most cases. In addition to these challenges, data is limited compared to the variability in patients’ conditions and procedures. Therefore, applying traditional machine learning and statistical methods is inefficient and, in some cases, impossible. In this dissertation, several statistical learning, machine learning, and decision modeling approaches are developed to address these challenges in respiratory disease treatment decision making. Treatments for respiratory diseases depend on the severity of the disease and the patient’s condition. Some of the most important interventions are medical treatment, surgery, and mechanical ventilation during hospitalization. In the first chapter, a Markov decision process is built using limited clinical trial data to assess the timing of surgery for patients with severe emphysema. In the next chapter, a statistical learning approach is proposed to prepare the irregularly-sampled and heterogeneous electronic health records as input to machine learning models. The method is applied to predict the outcome of mechanical ventilation in the ICU. A lab test importance score is also proposed to quantify the effect of each lab test in the prediction model and identify unnecessary lab tests. In the last chapter, an optimization approach is introduced to determine the optimal time-window size to regularize the time-series data of each patient and prepare the data to feed into any sequential machine learning model. The optimization results are applied to two sequential learning models. First, the results are applied to a long-short term memory (LSTM) model to predict discontinue time of mechanical ventilation. In the second problem, a reinforcement learning (RL) model is developed to find the optimal timing of lab tests and reduce the number of unnecessary lab tests while the patient is on ventilators.

Available for download on Monday, February 17, 2025