Modeling gene regulatory networks through data integration
Modeling gene regulatory networks has become a problem of great interest in biology and medical research. Most common methods for learning regulatory dependencies rely on observations in the form of gene expression data. In this dissertation, computational models for gene regulation have been developed based on constrained regression by integrating comprehensive gene expression data for M. tuberculosis with genome-scale ChIP-Seq interaction data. The resulting models confirmed predictive power for expression in independent stress conditions and identified mechanisms driving hypoxic adaptation and lipid metabolism in M. tuberculosis. I then used the regulatory network model for M. tuberculosis to identify factors responding to stress conditions and drug treatments, revealing drug synergies and conditions that potentiate drug treatments. These results can guide and optimize design of drug treatments for this pathogen. I took the next step in this direction, by proposing a new probabilistic framework for learning modular structures in gene regulatory networks from gene expression and protein-DNA interaction data, combining the ideas of module networks and stochastic blockmodels. These models also capture combinatorial interactions between regulators. Comparisons with other network modeling methods that rely solely on expression data, showed the essentiality of integrating ChIP-Seq data in identifying direct regulatory links in M. tuberculosis. Moreover, this work demonstrates the theoretical advantages of integrating ChIP-Seq data for the class of widely-used module network models. The systems approach and statistical modeling presented in this dissertation can also be applied to problems in other organisms. A similar approach was taken to model the regulatory network controlling genes with circadian gene expression in Neurospora crassa, through integrating time-course expression data with ChIP-Seq data. The models explained combinatorial regulations leading to different phase differences in circadian rhythms. The Neurospora crassa network model also works as a tool to manipulate the phases of target genes.