Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate


Show simple item record Xia, Yu en_US Franzosa, Eric A. en_US Gerstein, Mark B. en_US 2012-01-11T21:11:32Z 2012-01-11T21:11:32Z 2009-6-12 en_US
dc.identifier.citation Xia, Yu, Eric A. Franzosa, Mark B. Gerstein. "Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate" PLoS Computational Biology 5(6): e1000413. (2009) en_US
dc.identifier.issn 1553-7358 en_US
dc.description.abstract Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the correlations among evolutionary rate and these genomic features remains a major challenge. Here, we use an integrated probabilistic modeling approach to study genomic correlates of protein evolutionary rate in Saccharomyces cerevisiae. We measure and rank degrees of association between (i) an approximate measure of protein evolutionary rate with high genome coverage, and (ii) a diverse list of protein properties (sequence, structural, functional, network, and phenotypic). We observe, among many statistically significant correlations, that slowly evolving proteins tend to be regulated by more transcription factors, deficient in predicted structural disorder, involved in characteristic biological functions (such as translation), biased in amino acid composition, and are generally more abundant, more essential, and enriched for interaction partners. Many of these results are in agreement with recent studies. In addition, we assess information contribution of different subsets of these protein properties in the task of predicting slowly evolving proteins. We employ a logistic regression model on binned data that is able to account for intercorrelation, non-linearity, and heterogeneity within features. Our model considers features both individually and in natural ensembles ("meta-features") in order to assess joint information contribution and degree of contribution independence. Meta-features based on protein abundance and amino acid composition make strong, partially independent contributions to the task of predicting slowly evolving proteins; other meta-features make additional minor contributions. The combination of all meta-features yields predictions comparable to those based on paired species comparisons, and approaching the predictive limit of optimal lineage-insensitive features. Our integrated assessment framework can be readily extended to other correlational analyses at the genome scale. Author Summary Proteins encoded within a given genome are known to evolve at drastically different rates. Through recent large-scale studies, researchers have measured a wide variety of properties for all proteins in yeast. We are interested to know how these properties relate to one another and to what extent they explain evolutionary rate variation. Protein properties are a heterogeneous mix, a factor which complicates research in this area. For example, some properties (e.g., protein abundance) are numerical, while others (e.g., protein function) are descriptive; protein properties may also suffer from noise and hidden redundancies. We have addressed these issues within a flexible and robust statistical framework. We first ranked a large list of protein properties by the strength of their relationships with evolutionary rate; this confirms many known evolutionary relationships and also highlights several new ones. Similar protein properties were then grouped and applied to predict slowly evolving proteins. Some of these groups were as effective as paired species comparison in making correct predictions, although in both cases a great deal of evolutionary rate variation remained to be explained. Our work has helped to refine the set of protein properties that researchers should consider as they investigate the mechanisms underlying protein evolution. en_US
dc.description.sponsorship PhRMA Foundation; National Science Foundation (DGE-0654108); National Institues of Health; Williams professorship fund en_US
dc.language.iso en en_US
dc.publisher Public Library of Science en_US
dc.rights Xia et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. en_US
dc.title Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate en_US
dc.type article en_US
dc.identifier.doi 10.1371/journal.pcbi.1000413 en_US
dc.identifier.pubmedid 19521505 en_US
dc.identifier.pmcid 2688033 en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search OpenBU

Advanced Search


Deposit Materials