Data mining of host transcriptome and microbiome in pulmonary disease
MetadataShow full item record
Pulmonary disease is one of the most common and serious medical conditions in the world, and the correct diagnosis and prediction of incipient pulmonary diseases such as tuberculosis (TB) and lung cancer can greatly decrease the number of pulmonary disease-related deaths. In this thesis, I studied the transcriptome and microbiome difference between pulmonary disease patients and healthy controls, developed and applied several pipelines incorporating bioinformatics methods, statistics and machine learning models to identify patterns in human transcriptome as well as microbiome data for pulmonary disease prediction. On the host transcriptome side, I first evaluated the performance of existing TB disease and TB progression biomarkers, created a bulk RNA-seq gene-expression based biomarker selection pipeline, and then identified a 29-gene signature that can correctly predict TB progression as far as 6 years before the TB diagnosis. On microbiome side, I developed Animalcules, an R package for microbiome data analysis such as diversity comparison and differential abundance analysis, which supports both user graphical interface and command-line functions. I then applied Animalcules for two microbiome case studies: identifying the TB and Asthma related microbes. After working on host transcriptome and microbiome separately, I then discussed the computational framework for identifying host-microbe interactions, and its significant potential for studying pulmonary disease pathogenesis, diagnosis and treatment.