Integrative analysis of complex genomic and epigenomic maps
MetadataShow full item record
Modern healthcare research demands collaboration across disciplines to build preventive measures and innovate predictive capabilities for curing diseases. Along with the emergence of cutting-edge computational and statistical methodologies, data generation and analysis has become cheaper in the last ten years. However, the complexity of big data due to its variety, volume, and velocity creates new challenges for biologists, physicians, bioinformaticians, statisticians, and computer scientists. Combining data from complex multiple profiles is useful to better understand cellular functions and pathways that regulates cell function to provide insights that could not have been obtained using the individual profiles alone. However, current normalization and artifact correction methods are platform and data type specific, and may require both the training and test sets for any application (e.g. biomarker development). This often leads to over-fitting and reduces the reproducibility of genomic findings across studies. In addition, many bias correction and integration approaches require renormalization or reanalysis if additional samples are later introduced. The motivation behind this research was to develop and evaluate strategies for addressing data integration issues across data types and profiling platforms, which should improve healthcare-informatics research and its application in personalized medicine. We have demonstrated a comprehensive and coordinated framework for data standardization across tissue types and profiling platforms. This allows easy integration of data from multiple data generating consortiums. The main goal of this research was to identify regions of genetic-epigenetic co-ordination that are independent of tissue type and consistent across epigenomics profiling data platforms. We developed multi-‘omic’ therapeutic biomarkers for epigenetic drug efficacy by combining our biomarker regions with drug perturbation data generated in our previous studies. We used an adaptive Bayesian factor analysis approach to develop biomarkers for multiple HDACs simultaneously, allowing for predictions of comparative efficacy between the drugs. We showed that this approach leads to different predictions across breast cancer subtypes compared to profiling the drugs separately. We extended this approach on patient samples from multiple public data resources containing epigenetic profiling data from cancer and normal tissues (The Cancer Genome Atlas, TCGA; NIH Roadmap epigenomics data).