Computational protocol: Molecular, phylogenetic and developmental analyses of Sall proteins in bilaterians

Similar protocols

Protocol publication

[…] In order to identify candidate Sall proteins, we performed searches in different databases (Additional file : Table S1). Potential snail Sall sequences were derived from Biomphalaria glabrata and C. fornicata from RNA-seq datasets generated in our lab [] and uploaded to Geneious version 6.1.2 []. Potential sequences of the snail L. gigantea were retrieved from the JGI genome portal [] using tblastn and pblast alignment algorithms []. Potential sall orthologs for the Xenoturbellid Xenoturbella, the acoels Convolutriloba and Isodiametra, the annelid Dinophilus, the brachiopods Terebratalia and Novocrania, the nemertean Lineus, the priapulid Priapulus, the platyhelminth Prostheceraeus, the nemertodermatid Meara, and the bryozoan Membranipora were searched for in RNA-seq datasets (Additional file : Table S1). In order to have more representatives of other bilaterian (arthropods, nematodes, mollusks, tunicates, echinoderms, hemichordates, vertebrates) and non-bilaterian clades (cnidarians, ctenophorans, placozoans, poriferans) for a wider analysis of the phylogeny and structure of Sall proteins, additional searches were performed in the NCBI databases [] using keyword search (Spalt, Spalt-like, sall, sal-like), tblastn and pblast. In addition, the zinc-finger proteins containing the Sal-box motif Schnurri, PRDII-BF1 and HIVEP1 were also retrieved from the NCBI databases using keyword search (Schnurri) and tblastn and pblast search using the Sal-box motif as template. Translation into protein sequences was carried out using MacVector version 12.7 [], assuming standard codon usage. [...] Full-length sequences of available Sal-box containing proteins (Schnurri, PRDII-BF1, HIVEP1) (Additional file : Table S1) were aligned with Sall potential orthologs (Additional file : Fig. S1) using ClustalX version 2.1 [] followed by refinement by eye and trimmed in MacVector, selecting the homologous sequences and excluding sites of ambiguous alignment and gaps. In order to determine whether the newly determined potential Sall proteins were indeed Sall proteins or other proteins containing a Sal-box, we performed a phylogenetic analysis including the zinc-finger domains three and four (ZF3 and ZF4), the only two zinc-finger domains present in all proteins containing a Sal-box (Fig. ; Additional file : Fig. S1), for all the sequences retrieved in this study. Once orthology was established, a second phylogenetic analysis was performed including the zinc-finger domains two, three and five (ZF2, ZF3 and ZF5) (Additional file : Fig. S2) for all the Sall sequences retrieved in this study (Additional file : Table S1). These two datasets were subjected to coalescent-based, Bayesian inference (BI) phylogenetic analyses implemented using BEAST 1.8.3 software []. The JTT + G model [] was selected as the best-fit model of protein evolution using ProtTest []. We assumed a strict molecular clock and the Yule speciation model as the coalescent prior. Analyses were run for 3,000,000 generations, sampling trees and model parameters every 300 generations. Convergence of results was assessed by visual inspection of the log file using Tracer software [] and accordingly a burn-in period of 300,000 generations (10%) was established. We used TreeAnnotator software (distributed as part of the BEAST software package) to recover the maximum clade credibility (MCC) consensus tree from the post-burn-in sample of trees. The robustness of the inferred clades was evaluated based on Bayesian posterior probabilities (BPPs). Candidate sequences were identified as orthologs when they grouped in a clade with high statistical support (BPP > 0.95) with sequences of known identity. […]

Pipeline specifications

Software tools Geneious, TBLASTN, MacVector, Clustal W, BEAST, ProtTest
Databases JGI Genome Portal
Applications Phylogenetics, RNA-seq analysis, Amino acid sequence alignment
Organisms Drosophila melanogaster, Caenorhabditis elegans, Schmidtea mediterranea, Lottia gigantea
Chemicals Zinc