Centralized and distributed learning methods for predictive health analytics
MetadataShow full item record
The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability.