Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites

OpenBU

Show simple item record

dc.contributor.author Reddy, Timothy E en_US
dc.contributor.author DeLisi, Charles en_US
dc.contributor.author Shakhnovich, Boris E en_US
dc.date.accessioned 2012-01-11T00:40:41Z
dc.date.available 2012-01-11T00:40:41Z
dc.date.issued 2007-5-11 en_US
dc.identifier.citation Reddy, Timothy E, Charles DeLisi, Boris E Shakhnovich. "Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites" PLoS Computational Biology 3(5): e90. (2007) en_US
dc.identifier.issn 1553-7358 en_US
dc.identifier.uri http://hdl.handle.net/2144/3040
dc.description.abstract Computational prediction of nucleotide binding specificity for transcription factors remains a fundamental and largely unsolved problem. Determination of binding positions is a prerequisite for research in gene regulation, a major mechanism controlling phenotypic diversity. Furthermore, an accurate determination of binding specificities from high-throughput data sources is necessary to realize the full potential of systems biology. Unfortunately, recently performed independent evaluation showed that more than half the predictions from most widely used algorithms are false. We introduce a graph-theoretical framework to describe local sequence similarity as the pair-wise distances between nucleotides in promoter sequences, and hypothesize that densely connected subgraphs are indicative of transcription factor binding sites. Using a well-established sampling algorithm coupled with simple clustering and scoring schemes, we identify sets of closely related nucleotides and test those for known TF binding activity. Using an independent benchmark, we find our algorithm predicts yeast binding motifs considerably better than currently available techniques and without manual curation. Importantly, we reduce the number of false positive predictions in yeast to less than 30%. We also develop a framework to evaluate the statistical significance of our motif predictions. We show that our approach is robust to the choice of input promoters, and thus can be used in the context of predicting binding positions from noisy experimental data. We apply our method to identify binding sites using data from genome scale ChIP–chip experiments. Results from these experiments are publicly available at http://cagt10.bu.edu/BSG. The graphical framework developed here may be useful when combining predictions from numerous computational and experimental measures. Finally, we discuss how our algorithm can be used to improve the sensitivity of computational predictions of transcription factor binding specificities. Author Summary. A historically difficult problem in computational biology is the identification of transcription factor binding sites (TFBS) in the promoters of co-regulated genes. With increasing emphasis on research in transcriptional regulation, this problem is also uniquely relevant to emerging results from recent experiments in high-throughput and systems biology. Despite extensive research in the area, recent evaluations of previously published techniques show much room for improvement. In this paper, we introduce a fundamentally new approach to the identification of TFBS. First, we start by representing nucleotides in promoters as an undirected, weighted graph. Given this representation of a binding site graph (BSG), we employ relatively simple graph clustering techniques to identify functional TFBS. We show that BSG predictions significantly outperform all previously evaluated methods in nearly every performance measure using a standardized assessment benchmark. We also find that this approach is more robust than traditional Gibbs sampling to selection of input promoters, and thus more likely to perform well under noisy experimental conditions. Finally, BSGs are very good at predicting specificity determining nucleotides. Using BSG predictions, we were able to confirm recent experimental results on binding specificity of E-box TFs CBF1 and PHO4 and predict novel specificity determining nucleotides for TYE7. en_US
dc.description.sponsorship US National Institutes of Health (A08 POGM66401A, J50 01–130021) en_US
dc.language.iso en en_US
dc.publisher Public Library of Science en_US
dc.title Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites en_US
dc.type article en_US
dc.identifier.doi 10.1371/journal.pcbi.0030090 en_US
dc.identifier.pubmedid 17500587 en_US
dc.identifier.pmcid 1866359 en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search OpenBU


Advanced Search

Browse

Deposit Materials