|
|
|
|
|
DATABASES |
|
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
|
| EVOLUTIONARY RELATIONSHIPS AMONG THE SERPINS |
|
|
The serpins are a widely distributed group of serine proteinase inhibitors found in plants, birds, mammals and viruses. Despite the great evolutionary divergence of these organisms, their serpins are highly conserved, both in sequence and structurally. Amino acid sequences were aligned by a combination of automatic algorithms and by consideration of conserved structural elements in those serpins for which crystal structures exist. The program HOMED was used which allowed the alignment of amino acids to be simultaneously converted into the equivalently aligned nucleotide sequences.
The aligned amino acids were used as the basis for superposition of the four known three-dimensional structures for which coordinates are available and compared with an optimal three-dimensional superposition in order to estimate the reliability of the sequence alignment. Phylogenetic relationships implied by these nucleotide sequence alignments were determined by the method of maximum parsimony. The proposed gene tree suggested that as much diversity existed between the plant serpin and mammalian serpins as was present among mammalian serpins and provided further evidence that the architecture of serpin molecules is highly constrained.
The Phylogeny of the Serpin Superfamily:
In approaching a phylogenetic reconstruction of a protein family, a choice must be made between a number of approaches that:
|
|
use amino acid or nucleotide sequence;
treat each sequence as a whole (sequence-based) or each amino acid/nucleotide individually(character-based);
make different assumptions about how readily one amino acid type is substituted Evolution and Classification of the Serpin Superfamily 7 with another, and whether this differs depending on the position in the sequence;
measure the difference between sequences or amino acids/nucleotides differently;
produce trees according to different criteria, e.g. deriving a tree that achieves a minimum overall evolutionary score; and
involve very different degrees of computation time.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| Figure 1 |
|
Figure 2 |
|
Figure 3 |
|
Figure 4 |
|
40 Diverse approaches have been applied to the serpin superfamily.A character-based method that attempts to derive the tree with the least number of “evolutionary steps” between branches and nodes, the “maximum parsimony” approach,41 has been applied to nucleotide data of the whole family7 and to protein data within subfamilies;8 the character-based “maximum likelihood” approach,42 which attempts to find the branching arrangement that produces the most likely tree, has also been used.37 The application of the “neighbor-joining” method,
based on computed evolutionary differences between sequences, has been assessed as being in best agreement with other data such as serpin gene structure37 and co-localization at chromosomal loci.8 Nevertheless, the statistically significant phylogenetic groups determined by the various methods are compatible with one another, and differ primarily in the proportion of sequences that can be assigned to those groups. There are 35 known functional human serpin genes.
A summary evolutionary tree depicting the established phylogenetic relationships between subfamilies of serpin genes containing at least one human member are shown in the human sequences. The tree is based on a “neighbor-joining” approach, 43 with the non-parametric bootstrap technique44 used to eliminate relationships with poor support; consequently, several branches radiate from the base of the tree because their relationships are uncertain.8 Where evidence of a common gene structure36 or chromosomal position13,14 suggests the order of sequence divergence,
these are also shown in .As the base of the tree indicates the lowest point at which groupings can be reliably determined, it does not reflect a fixed point in time. However, clades containing human serpin genes notably are restricted to sequences from vertebrate organisms; furthermore, non-vertebrate clades (which are not shown in this tree) lack vertebrate sequences. Hence, the deepest point depicted by the base of the tree post-dates the appearance of the vertebrate lineage, estimated to be approximately 800MYA.45,46.
|
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
 |
|
|
 |
|
|
|
|
|
|
|
|
|
|
| Figure 5 |
|
Figure 6 |
|
Figure 7 |
|
Clades C (antithrombin), D (heparin cofactor II), G (C1-inhibitor), H (heat shock protein 47), and I (neuroserpin) comprise few sequences which are mostly orthologues; hence, the evolutionary relationships between their members are consistent with common functions. The other clades mostly have members with markedly different properties. A dichotomy is apparent between clade B, which comprised of predominantly intracellular serpins, and the other vertebrate clades, which are predominantly populated by serpins with extracellular roles, although there are exceptions to this observation. Non-inhibitory serpins do not cluster together and several clades contain serpins that are known to be activated by ligands.
|
|
|
|
| |
|
|
| SLIDE SHOW |
|
|
|
|
|
|
|
|
|