Novel statistical methods to improve precision medicine
Embargo Date
2027-02-12
OA Version
Citation
Abstract
Precision medicine, also known as personalized medicine, refers to the tailoring of therapeutic or preventive interventions to specific subpopulations of patients based on the patients’ characteristics. Accurate disease subtyping could be essential for precision medicine, which aims to provide individualized treatments to patients. The development of precision medicine thus relies on a sufficient understanding of the underlying mechanisms of diseases. Recent technological advances, especially in genomics and molecular biology, have provided unprecedented opportunities to gain greater insight into disease subtypes and underlying mechanisms. However, translating this deep wealth of knowledge into clinical practice for precision medicine remains a challenging task.This dissertation intends to improve precision medicine from two statistical perspectives. The first is to identify disease subtypes and related biomarkers through clustering using multi-omics data, which could be the first step toward precision medicine. The second is to accurately stratify patients into subtypes through companion diagnostic devices (CDx) in clinical trials, which is directly related to developing targeted precision medicine therapy. We propose two novel convex clustering methods that allow the incorporation of prior information or knowledge and generate stable cluster results. One is information-incorporated Sparse Convex Clustering (iSCC), utilizing a text mining approach to retrieve existing information from previously published studies on available sources, such as PubMed, to identify disease-related biomarkers and improve disease subtyping. The other one is Prior Knowledge-assisted Integrative Convex Clustering (PK-ICC), incorporating prior biological knowledge on grouping information between features, such as biological pathways and the gene regulatory mechanism, through a group lasso penalty to improve disease subtyping and select relevant groups of features simultaneously. Both simulations and real data analysis have demonstrated that our proposed methods can identify more accurate disease subtypes and biologically meaningful biomarkers. We also propose a finite mixture model framework to quantify the impact of CDx measurement performance on clinical trials with binary or time-to-event outcomes, which helps future design of trials when using CDx. Overall, this dissertation has proposed statistical methods that may improve the identification of disease subtypes and the design of CDx incorporated trials, which may lead to better clinical outcomes through precision medicine.
Description
2024
License
Attribution-NonCommercial-NoDerivatives 4.0 International