Computational protocol: Evolution of the Vertebrate Paralemmin Gene Family: Ancient Origin of Gene Duplicates Suggests Distinct Functions

Similar protocols

Protocol publication

[…] The human paralemmin family sequences were used as queries in tblastn searches of the Ensembl database (www.ensembl.org) in the following vertebrate genomes: human (Homo sapiens), mouse (Mus musculus), chicken (Gallus gallus), Western clawed frog (Silurana (Xenopus) tropicalis), zebrafish (Danio rerio), medaka (Oryzias latipes), fugu (Takifugu rubripes), three-spined stickleback (Gasterosteus aculeatus), green spotted pufferfish (Tetraodon nigroviridis) and sea lamprey (Petromyzon marinus). In order to appropriately root the phylogenetic trees, corresponding searches were made in the Ensembl genome databases for a tunicate (Ciona intestinalis) and a nematode (Caenorhabditis elegans) or fruitfly (Drosophila melanogaster), as well as in the lancelet (Branchiostoma floridae) in the National Center for Biotechnology Information (NCBI) databases at http://www.ncbi.nlm.nih.gov. In Ensembl, the protein predictions representing the best BLAST hits were collected and their chromosome locations were noted. For short, incomplete or divergent protein predictions, better predictions were manually curated from the corresponding genomic sequence with regard to consensus start and stop codons, splice donor and acceptor sites and sequence similarity to other identified family members. Expressed sequence tags (ESTs) curated and aligned by the Ensembl database were also considered. The InterPro database of protein domain predictions (www.ebi.ac.uk/interpro) was used to identify conserved protein domains. Ensembl searches were initiated in database versions 55 (July 2009) and 56 (September 2009), and simultaneously in the pre.ensembl.org database for the sea lamprey genome. All sequences and database identifiers were verified against the most updated genome assembly versions as shown in Ensemble database version 66 (February 2012). This information can be found in .To identify putative paralemmin family members in the lancelet genome with greater certainty, a protein blast search was performed using the pattern hit initiated algorithm (PHI-BLAST) in the NCBI non-redundant protein sequence database. The identified zebrafish palmdelphin-B (see ) was used as query and the conserved amino acid motif KX[KR]XXR[ED]XWL[ML], identified from a preliminary alignment of the identified vertebrate paralemmin homologs, was entered as PHI-pattern. [...] The identified protein predictions from the database searches were used to produce amino acid alignments using the ClustalW tool with stardard settings in Jalview2.4 . Green spotted pufferfish sequences were included in the phylogenetic analyses of the paralemmin gene family, but not of neighboring gene families due to the close evolutionary relatedness of this species and fugu. The final alignments were manually inspected using Jalview with regard to incomplete protein sequence predictions and poorly aligned sequence stretches. Details on the sequence curation and alignment editing process can be given upon request.Two bootstrapped phylogenetic methods were applied on the alignments: a neighbor joining (NJ) analysis and a phylogenetic maximum likelihood (PhyML) analysis . The NJ tree construction method (with 1000 bootstrap replicates) was applied with standard settings (Gonnet weight matrix, gap opening penalty 10.0 and gap extension penalty 0.20) in ClustalX 2.0 . The PhyML method was applied using the web-application of the PhyML 3.0 algorithm available at http://www.atgc-montpellier.fr/phyml/ with the following settings: amino acid frequencies (equilibrium frequencies), proportion of invariable sites and gamma-shape parameters were estimated from the datasets; the number of substitution rate categories was set to 8; BIONJ was chosen to create the starting tree and both the NNI and SPR tree improvement methods were used to estimate the best topology; both tree topology and branch length optimization were chosen. For branch support a bootstrap analysis with 100 replicates was chosen. The best amino acid substitution models for the PhyML analyses were estimated from the alignments using ProtTest 1.4 . Models were tested with no add-ons and assuming eight gamma rate categories, the optimization strategy was set to slow and the BIONJ strategy was selected for the random input tree. The JTT model was assumed for all PhyML analyses based on the ProtTest results.The paralemmin gene family tree was not rooted since no complete invertebrate paralemmin sequence could be identified. Regarding the neighboring gene families, identified nematode sequences were used if such a sequence could be found, if not, identified lancelet sequences were used. If no invertebrate protein prediction could be found, the trees were unrooted. […]

Pipeline specifications

Software tools TBLASTN, InterPro, DELTA-BLAST, BLASTP, Clustal W, Jalview, PhyML, ProtTest
Applications Phylogenetics, Protein sequence analysis, Amino acid sequence alignment
Organisms Petromyzon marinus, Branchiostoma floridae, Homo sapiens
Diseases Neoplasms