Computational protocol: Back to Water: Signature of Adaptive Evolution in Cetacean Mitochondrial tRNAs

Similar protocols

Protocol publication

[…] For the present study, we sequenced the complete mitochondrial genome of a specimen of Z. cavirostris. The striated muscle tissue (approximately 0.5 g) used as the starting material to extract the total DNA was obtained from a female specimen of Z. cavirostris that had been stored since 2007 at -80°C at the Mediterranean marine mammal tissue bank (MMMTB, www.marinemammals.eu) of the University of Padova (specimen # ID 135). MMMTB is a non-profit public organisation that preserves for scientific purposes the tissues of Cetacean specimens that beached and died naturally along the Italian Coasts. MMMTB promotes the study and conservation of Cetacea. MMMTB is officially supported by the Italian Ministry of Environment and is CITES credited. The scientific study of tissues obtained from MMMTB does not require the approval of an ethical committee.The extraction was performed through a salting-out protocol []. The amplification and sequencing of mitochondrial DNA were performed using a mixture of mammalian universal primers [] and primers specifically designed against available sequences belonging to the family Ziphiidae. The quality of DNA was assessed through electrophoresis in a 1% agarose gel. The PCR products were directly sequenced using the primers used for amplification. The sequencing was performed by BMR Genomics (http://www.bmr-genomics.it/; Padua, Italy). Both strands of PCR products were sequenced to ensure the standard accuracy required by this type of sequencing activity. The mtDNA consensus sequence was assembled using the SeqMan II program from the Lasergene software package (DNAStar, Madison, WI). The coverage of the whole consensus sequence was at minimum 2X and in most cases 3X to 4X. The genome was annotated following the strategy briefly described below [,].Initially, the mtDNA sequence was translated into putative proteins using the Transeq program available on the EBI website. The true identity of these polypeptides was established using the BLAST program [,]). The boundaries of genes were determined as follows. The 5' ends of protein-coding genes (PCGs) were defined as the first legitimate in-frame start codon (ATN, GTG, TTG, GTT) in the open reading frame (ORF) that was not located within an upstream gene encoded on the same strand. The only exceptions were atp6 and nad4, which were previously demonstrated to overlap with their upstream gene i.e., atp8 and nad4L, respectively, in many mtDNAs []. The PCG terminus was defined as the first in-frame stop codon that was encountered. When the stop codon was located within the sequence of a downstream gene encoded on the same strand, a truncated stop codon (T or TA) adjacent to the beginning of the downstream gene was designated as the termination codon. This codon was thought to be completed by polyadenylation, thereby producing a complete TAA stop codon after transcript processing. Finally, pairwise comparisons with orthologous proteins were performed using the ClustalW program [] to better define the limits of the PCGs.Regardless of the real initiation codon, formyl-Met was assumed to be the starting amino acid for all proteins as has been previously demonstrated in other mitochondrial genomes [,]).Transfer RNA genes were identified using the tRNAscan-SE program [] or recognised manually as sequences having the appropriate anticodon and capable of folding into the typical cloverleaf secondary structure of tRNAs []. The validity of these predictions was further enhanced by comparison based on multiple alignment and structural information to published orthologous counterparts [].The boundaries of the ribosomal rrnS and rrnL genes were those defined by the pairs of tRNAs adjacent upstream/downstream to these genes (i.e., trnF and trnV for rrnS; trnV and trnL2 for rrnL). [...] Initially, each set of the 13 orthologous protein-coding genes derived from 94T-set was aligned using the pipeline implemented in the TranslatorX server []. This webtool ensures that the alignment of DNA sequences is obtained using as a backbone the multiple alignment derived from the amino acid counterparts. The MAFFT program was used to produce the alignments [,]). Successively, the Gblocks program (with the most stringent parameters activated) was used to the select the most conserved positions of the alignments []. Finally, the 13 Gblocks-processed nucleotide alignments were concatenated into a single multiple alignment (94T.13PCG.set).The sequences of the orthologous tRNAs obtained from 94T-set were manually aligned considering the secondary structures predicted with tRNA-scan or that were available in the literature (see ) []. The same strategy was applied to produce multiple alignments necessary to investigate the intraspecific variation of every tRNA for the species of cetaceans for which several/many mtDNA sequences exist. In the case of the 94T-trnXs alignment, it was not possible to model the substitution process for the most variable portions located in the DHU and TΨC loops of some tRNAs. In contrast, it was always possible to model the substitution process within the Cetacea clade.Irrespective of the strategy used to obtain the multiple alignments, these alignments were successively imported into MEGA 5.2.2 [] for further bioinformatic analyses. [...] Maximum likelihood phylogenetic analyses [] were performed using the program RAxML 7.4.2 [] implemented in the graphical user interface raxmlGUI 1.3.1 []. A nonparametric bootstrap test [] was performed to assess the robustness of the topologies (1,000 replicates). Phylogenetic analyses were performed on nucleotide/amino acid datasets exhibiting the highest phylogenetic signals. In the case of DNA datasets, the GTR evolutionary model [] was applied, while the heterogeneity of the substitution process was modelled with the CAT []. In the case of amino acid datasets, the MTMAM substitution matrix [] was used in combination with the CAT algorithm. Partitioning schemes were used to test their effect on the tree topologies.Phylogenetic analyses were performed on the position 2, positions 1+2, and amino-acid subsets of 94T.13PCG.set, which exhibited the highest signals, with and without partitions.All of the obtained trees were identical to the topology depicted in . In the topology, most of the nodes received bootstrap support. The tree in was generated from the amino acid dataset. The topology revealed that many amino acids changed along the branch reaching the root of Cetacea. The mysticete Caperea marginata and, more markedly, the odontocetes Kogia breviceps, P. macrocephalus, Platanista minor, Lipotes vexillifer, Pontoporia blainvillei, Inia geoffrensis, and Monodon monoceros showed branches that were decidedly longer that those of other Cetacean species. No further details are presented here on the phylogeny of Cetacea. A comment must be introduced to explain this point. The phylogeny of Cetacea is a very active field of study, and several papers have been published on this topic [,,,–]. The overall phylogenetic relationships among major lineages were consistently recovered in the studies mentioned above and are depicted in . In contrast, the vast majority of the published trees exhibit one or more points of disagreement. In the present paper, the topology of was used as a reference tree to map the evolution of CSBPSs. Alternative phylogenetic relationships were considered to test whether they could produce relevant changes in our results (data not shown). These topologies gave, at most, marginal variations restricted to single nodes and did not alter the global evolutionary pathway for the CSBPSs. Thus, they will not be described in detail in the present paper. [...] The CSBPSs occurring in the multiple alignments of orthologous tRNAs were tracked along a reference tree according to the maximum likelihood method available in MEGA 5.2.2 [] and according to the maximum parsimony approach implemented in the Mesquite program []. In the latter, the nucleotide changes were assumed to be unordered events. The mismatches occurring at the boundaries between DHU and TΨC arms and loops were not considered. This choice was dictated by the fact that in some cases, the length of the arms was variable without disrupting the secondary structure (). […]

Pipeline specifications