Data analytics and optimization methods in biomedical systems: from microbes to humans
MetadataShow full item record
Data analytics and optimization theory are well-developed techniques to describe, predict and optimize real-world systems, and they have been widely used in engineering and science. This dissertation focuses on applications in biomedical systems, ranging from the scale of microbial communities to problems relating to human disease and health care. Starting from the microbial level, the first problem considered is to design metabolic division of labor in microbial communities. Given a number of microbial species living in a community, the starting point of the analysis is a list of all metabolic reactions present in the community, expressed in terms of the metabolite proportions involved in each reaction. Leveraging tools from Flux Balance Analysis (FBA), the problem is formulated as a Mixed Integer Program (MIP) and new methods are developed to solve large scale instances. The strategies found reveal a large space of nuanced and non-intuitive metabolic division of labor opportunities, including, for example, splitting the Tricarboxylic Acid Cycle (TCA) cycle into two separate halves. More broadly, the landscape of possible 1-, 2-, and 3-strain solutions is systematically mapped at increasingly tight constraints on the number of allowed reactions. The second problem addressed involves the prediction and prevention of short-term (30-day) hospital re-admissions. To develop predictive models, a variety of classification algorithms are adapted and coupled with robust (regularized) learning and heuristic feature selection approaches. Using real, large datasets, these methods are shown to reliably predict re-admissions of patients undergoing general surgery, within 30-days of discharge. Beyond predictions, a novel prescriptive method is developed that computes specific control actions with the effect of altering the outcome. This method, termed Prescriptive Support Vector Machines (PSVM), is based on an underlying SVM classifier. Applied to the hospital re-admission data, it is shown to reduce 30-day re-admissions after surgery through better control of the patient’s pre-operative condition. Specifically, using the new method the patient’s pre-operative hematocrit is regulated through limited blood transfusion. In the last problem in this dissertation, a framework for parameter estimation in Regularized Mixed Linear Regression (MLR) problems is developed. In the specific MLR setting considered, training data are generated from a mixture of distinct linear models (or clusters) and the task is to identify the corresponding coefficient vectors. The problem is formulated as a Mixed Integer Program (MIP) subject to regularization constraints on the coefficient vectors. A number of results on the convergence of parameter estimates for MLR are established. In addition, experimental prediction results are presented comparing the prediction algorithm with mean absolute error regression and random forest regression, in terms of both accuracy and interpretability.