A graph-theoretical treatment of protein domain evolution
Shakhnovich, Boris E.
MetadataShow full item record
Understanding the mechanisms and driving forces behind molecular evolution is the defining challenge ofcomputational biology. However, a comprehensive, quantitative theory ofmolecular evolution remains elusive. We evaluate a new graph-theoretic treatment ofthis problem. We start by defining a multi-dimensional protein domain universe graph (PDUG). The nodes in this graph are the atomic units of evolution - structures ofrecurring domains and sequences that fold into those structures. Each ofthe three dimensions in PDUG-structure, function and phylogeny represents a potential constraint from evolutionary pressure. We go on to characterize graph-theoretic properties such as phase transitions, power-law degree distributions, and correlations between the three dimensions. We compare the observed properties with those expected from random graphs. The comparison enables us to identify the likely contours of sets of co-evolved proteins. We further our understanding by assessing several computationally tractable models of evolution that recapitulate some fundamental characteristics of PDUG. We go on to define fitness characteristics derived from simple physical properties of structure and function that serve to clarify the uneven relationship between fold and sequence space topology. However, we also find that evolutionary history plays a crucial role since structural fitness is only the potential for sequence entropy, while variable time of evolutionary search determines the fulfillment of that potential. Armed with our new understanding of protein fitness we describe its progression over time. We establish that eukaryotic domains enjoy a faster exploration of sequence and function space than prokaryotic ones. We further note that biological phenomena such as thermophilic adaptation and duplication success may be explained in light of our newly found understanding ofprotein fitness. Finally, we employ the newly developed PDUG paradigm to quantify the structure-function relationship. We show through modeling of divergent evolution that functions coalesce non-randomly as sfructural clusters grow. We fmd that the widely held hierarchical description of structure space has theoretical underpinnings in the natural clustering of the PDUG. We finish by calculating the theoretical lower limit of uncertainty inherent in structure function correlation of protein domains.
Thesis (Ph.D.)--Boston University. PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at firstname.lastname@example.org. Thank you.