Computational protocol: A Broad Genomic Survey Reveals Multiple Origins and Frequent Losses in the Evolution of Respiratory Hemerythrins and Hemocyanins

Similar protocols

Protocol publication

[…] All potential hemerythrin, hemocyanin, and tyrosinase sequences were identified by HMMER searches against the Protein, Genome, and EST databases at the NCBI (National Center for Biotechnology Information) and against completed genome/transcriptome projects databases publicly available or that are being conducted in our laboratories (sequences available in supplementary file S1, Supplementary Material online) with the default parameters and an inclusive E-value of 0.05. The retrieved sequences were aligned using MAFFT () L-INS-i algorithm, and then manually inspected to remove those hits fulfilling one of the following conditions: 1) incomplete sequences with >99% sequence identity to a complete sequence from the same taxa; 2) sequences that showed extremely long branches in the preliminary maximum likelihood trees; and 3) incorrect gene model predictions. The final alignment was carried out using the MAFFT G-INS-i algorithm (for global homology). Maximum likelihood (ML) phylogenetic trees were estimated by RaxML () and the best tree from 100 replicates was selected. Bootstrap support was calculated from 1,000 replicates. Bayesian inference analyses were performed with PhyloBayes (), using two parallel runs for 500,000 generations and sampling every 100. Bayesian posterior probabilities (BPP) were used for assessing the statistical support of each bipartition. The domain architecture of all retrieved sequences was inferred by performing a Pfam scan with the gathering threshold as cut-off value. The domain information was used to assess the reliability of each sequence of the initial dataset, to help define protein families according to their architectural coherence, and to assess the level of functional and structural diversification of hemerythrins, hemocyanins, and tyrosinases across the eukaryote lineages. […]

Pipeline specifications

Software tools HMMER, MAFFT, RAxML, PhyloBayes
Application Phylogenetics