Computational protocol: Mitochondrial genome evolution in Alismatales: Size reduction and extensive loss of ribosomal protein genes

Similar protocols

Protocol publication

[…] Specimens of Zostera marina L. (specimen voucher: A. Cuenca C2544, Denmark, Humlebæk Strand 6 Sept. 2009 (C)) and Stratiotes aloides L. (specimen voucher: O. Seberg et al. C2459, Denmark, Køge Bugt 10 Sept. 2008 (C)) were collected in the wild, and no specific permits were required for the collection of the material. The species are not protected by Danish law and they are collected in public areas where no permits are needed.The complete mitogenomes were assembled using a combination of 454 and Illumina sequencing data. The 454 data generated from mitochondrial enriched DNA extractions, were re-used from Cuenca et al. []. For llumina sequencing, DNA was extracted from silica gel preserved material using a standard CTAB method []. Short-insert, paired-end (PE) libraries with average insert size of 500 bp were constructed and run in 1/16 of a lane on an Illumina HiSeq 2000 (Illumina, San Diego, CA). Libraries and sequencing data were produced at the Danish National High-Throughput DNA Sequencing Centre. Raw reads were trimmed for quality, adapters and unidentified nucleotides using AdapterRemoval [].454 reads were initially assembled in Newbler v. 2.3 (454 Life Sciences Corp, CT, USA) using default settings. Contigs were ordered using the program bb.454contignet ( that reads the assembling information from Newbler and depicts the contigs as a web, indicating contig connections []. Contigs were then extended and assembled as described by Cuenca et al. []. To identify reads of adjacent contigs and to determine the borders of duplications, contigs were extended by blasting approximately the last 75 nt of each contig border against a database of all raw 454 sequence reads. These BLAST analyses were done using the BLASTN program [] in stand-alone BLAST ver. 2.2.21 ( Consensus sequences of each contig were used as seed sequences and extended using both 454 reads and Illumina reads in the Short Sequence Assembly by K-mer search and 3' read Extension program, SSAKE ver. 3.5 [], with parameters -m 15 -o 2 -r 0.6 -p 0 -t 0 -v 1. Finally, all 454 and Illumina reads where mapped to the assembled mitogenomes using Geneious ver. 7.1 (Biomatters Ltd.) to verify the assemblies, evaluate coverage and correct for potential homopolymer length errors in particular attributable to the 454 reads.Sequences of protein coding genes and rRNA genes were identified by BLASTN using a local database of extracted gene sequences from the 27 angiosperm species including Butomus umbellatus (NC021399 []) and Spirodela polyrhiza (NC0178840 []) from the Alismatales (). In addition to complete protein coding genes with intact reading frames, fragments of known genes >100 bp were annotated. A sequence was recognized as a pseudogene if it had a length comparable to known functional genes, but could not be translated into an amino acid sequence even following potential RNA editing. The tRNA genes were identified using tRNAscan-SE 1.21 []. Annotation was performed manually in Geneious vers. 7.1–9.0 (Biomatters Ltd.). The assembled and annotated genomes are deposited in GenBank under accession numbers KX808393 (Stratiotes aloides) and KX808392 (Zostera marina).To verify deviating gene sequences in the mitogenome of Zostera marina, we produced Illumina sequences for another species of Zostera, Z. noltii Hornem. (specimen voucher: Seberg et al. C2453, Denmark, N of Munkholmbroen 10 Sept. 2008 (C)). DNA extraction and sequencing were performed as above, except that the library was run on an Illumina HiSeq 2500. Complete assembly of this mitogenome was not attempted, but the genes except the tRNA genes, were extracted after mapping the Illumina reads to the Zostera marina mitogenome using Geneious ver. 8.0 (Biomatters Ltd.). Reads mapping to identified, complete or partial genes were used for de novo assembly of individual loci, and if necessary in order to obtain complete gene sequences one or more rounds of Map to Reference as implemented in Geneious ver. 8.0 (Biomatters Ltd.) were used to extend the sequence of individual loci. A local database of genes from other completely assembled mitogenomes () was used as reference sequences to search for genes not present in Z. marina. Identified genes and flanking sequence of Z. noltii are deposited in GenBank under accession numbers KX808258-KX808296. [...] To identify regions of potential plastid origin in the complete mitogenomes of Zostera and Stratiotes we performed a BLASTN search against a database of 23 angiosperm plastid genomes, including genomes from the genera Elodea (Hydrocharitaceae), Lemna and Spirodela (Araceae) from the Alismatales (). Plastid genome sequences of Stratiotes and Zostera are not available. Only mitochondrial sequences larger than 100 bp and with a similarity score higher than 80% were considered. If the hits included protein coding gene sequence we used the mitochondrial sequence matching the plastid protein coding gene to perform a new BLASTN search against all sequences in GenBank and against local databases of plastid gene sequences created from the data provided by Ross et al. [] available at (DOI: 10.6084/m9.figshare.1407422.v1). We also included the protein coding sequences in phylogenetic analyses using the matrices from Ross et al. []. We realigned the sequences after inclusion of the new sequence copies found in the mitogenomes using the MUSCLE [] plugin in Geneious ver. 8.1. Subsequent phylogenetic analyses were performed using RAxML ver. 7.2.8 [] with 100 replicates of rapid bootstrapping and a GTR+GAMMA+I model as implemented in Geneious ver. 8.1.To identify regions of potential nuclear origin in the mitogenomes of Zostera and Stratiotes we searched for repetitive elements using the Repbase Update repetitive element database []. For Zostera we also performed a BLASTN search (maximum E-value = 1e-50) of the complete mitogenome against a local database created in Geneious ver. 9.0 including all the contigs from the Zostera marina nuclear genome (GenBank no. LFYR00000000 []). BLASTN results of sequences longer than 250 bp and a pairwise similarity >80% were inspected for sequence features, which could indicate the direction of potential transfers. E.g., finding a sequence in the mitogenome including (part of) a gene normally located in the nuclear genome would indicate transfer from the nuclear genome to the mitogenome.We searched for dispersed repeated sequences in the complete mitogenomes of Zostera and Stratiotes by blasting the sequences against themselves using BLASTN. The results were filtered to retain only matches of sequences longer than 100 bp and pairwise similarity >80%.To evaluate the conservation of gene order in the Alismatales, we searched for clusters of genes shared between the complete mitogenomes of Spirodela, Butomus, Zostera and Stratiotes. Gene clusters are defined as two or more adjacent genes in the same direction shared by at least two taxa. This was done by simple visual inspection of the annotated mitogenome sequences.To get a rough estimate of mitogenome DNA similarity between pairs of Alismatales species we made pairwise BLASTN analyses with the four complete mitogenome sequences. Due to the heuristic nature of BLAST, blasting a sequence A against B, does not necessarily give the same result as blasting B against A. Accordingly, all analyses were done in both directions and average values were calculated as in Guo et al. []. […]

Pipeline specifications