TY - THES T1 - On the Evolution of Proteins from Peptides A1 - Alva Kullanja,Vikram Y1 - 2012/11/05 N2 - Though seemingly endless, the diversity of proteins in nature is in fact narrowly confined. Many proteins share recognizable similarity in sequence and structure, since they arose by amplification, recombination, and divergence from a basic complement of autonomously folding modules, referred to as domains, many of which date back to the time of the Last Universal Common Ancestor. Indeed, sequence comparison of modern proteins shows that they fall into only about 10,000 domain families, which can be further grouped into just about 3000 broader evolutionary superfamilies. Beyond this, superfamilies are assigned to one of about 1000 folds based on the topological arrangement of their secondary structural elements. The prevailing view holds that folds are analogous in character, the similarity between different superfamilies of one fold being the result of convergent evolution. However, the recent growth of molecular databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies, showing that not all of them arose independently. The first aim of this thesis was to determine how widespread such distant relationships are. To this end, I clustered domains representative of known fold types by their sequence similarity, a property that reflects common descent. The obtained cluster map shows that while some highly populated folds indeed appear to have evolved convergently, most domains of the same fold arose from an ancestral prototype, revealing that proteins are much less polyphyletic than previously assumed. Whereas it is widely accepted that modern proteins arose by combinatorial shuffling of a limited set of domains, the origin of this set itself is poorly understood. Even the simplest domains are too complex to have arisen de novo. If so, how did the first domains emerge? This question formed the second aim of this thesis. One theory for the origin of domains, the antecedent domain segment theory, proposes that they themselves arose from an even smaller pool of peptides with secondary structure propensity, which emerged as cofactors in the RNA world. Progressively more stable domains evolved from this set by amplification and by accretion, that is, by additive assemblage of simple structural elements. If this is true, many modern domains might still contain vestiges of the ancient peptides they arose from. To investigate this, I systematically compared domains of known structure using the state-of-the-art remote homology detection method HHsearch and identified 50 fragments that co-occur in domains with different folds, yet show significant similarities in sequence and structure. The occurrence of these homologous fragments in otherwise analogous structures provides compelling evidence for the antecedent domain segment theory. As an example, one of these 50 fragments, corresponding to a helix-strand-helix motif that gave rise divergently to three different folds, including the histone fold, is presented. In addition to showing that most domains of one fold arose from an ancestral form by divergence, this thesis reveals many incidences of homologies between superfamilies of different folds due to the discovery of shared ancestral peptides. However, current protein classifications consider folds to be analogous and do not contain a hierarchical level to capture such inter-fold relationships. To solve this problem, this work proposes a classification level above the fold level, the metafold, which unites groups of folds for which a homologous relationship has been corroborated. The metafold level is an important step on the way to a classification of proteins by natural descent, which is the most informative basis for structural and functional inference. KW - Bioinformatik KW - Molekulare Evolution KW - Proteine KW - Homologie KW - Klassifikation CY - Tübingen PB - Universitätsbibliothek Tübingen AD - Wilhelmstr. 32, 72074 Tübingen UR - http://tobias-lib.uni-tuebingen.de/volltexte/2012/6499 ER -