Transcription factor binding distribution and properties in prokaryotes
MetadataShow full item record
The canonical model of transcriptional regulation in prokaryotes restricted binding site locations to promoter regions and suggested that the binding sequences serve as the main determinants of binding. In this dissertation, I challenge these assumptions. As a member of the TB Systems Biology Consortium, I analyzed and validated ChIP-Seq and microarray experiments for over 100 transcription factors (TFs). In order to study the transcriptional functions of predicted binding sites, I integrated binding and expression data and assigned potential regulatory roles to 20% of the binding sites. Stronger binding sites were more often associated with regulation than weaker sites, suggesting a correlation between binding strength and regulatory impact. Seventy-six percent of the sites fell into annotated coding regions and a significant proportion was assigned to regulatory functions. To study the importance of binding sequences, I compared experimental sites with computational motif predictions. Although a conservative binding motif was found for most TFs, only a fraction of the observed motifs appeared bound in the experiment. Some low-affinity binding sites appeared occupied by the corresponding TF while many high-affinity binding sites were not. Interestingly, I found exactly the same nucleotide sequences (up to 15 residues long) bound in one area of the genome but not bound in another area, pointing to DNA accessibility as an important factor for in vivo binding. To investigate the evolutionary conservation of binding-site occupancy, sequence, and transcriptional impact, I analyzed ChIP-Seq and expression experiments for five conserved TFs for two-to-four Mycobacterial relatives. The regulon composition showed significantly less conservation than expected from the overall gene conservation level across Mycobacteria. Despite expectations, sequence conservation did not serve as a good indicator of whether or not a computationally predicted motif was bound experimentally; and in some cases, a fully conserved motif was bound in one relative but not in the other. Conservation of genic binding sites was higher than expected from the random model, adding to the evidence that at least some genic sites are functional. Understanding the evolutionary story of binding sites allowed me to explain unusual site configurations, some of which indicated a role for DNA looping.