Computational protocol: Multiple origins of endosymbionts in Chlorellaceae with no reductive effects on the plastid or mitochondrial genomes

Similar protocols

Protocol publication

[…] The organellar genomes of C. heliozoae, C. variabilis Syngen and M. conductrix were assembled from the Illumina sequence reads by running Velvet version 1.2.03 using different pairwise combinations of Kmer (61, 71, 81, 91) and expected coverage (50, 100, 200, 500, 1000) values, as described previously, . Scaffolding was turned off and the minimum coverage parameter was set to 10% of expected coverage. To verify the species identities of the three endosymbionts, the sequence for the intact nuclear rRNA region (including 18S rRNA, ITS1, 5.8S rRNA, ITS2, and 26S rRNA) was extracted from one of the assemblies for each species and used in a blast analysis against the NCBI nucleotide database to find the closest sequence match. The nuclear rRNA region assembled from our sequenced samples (C. heliozoae SAG 3.83, C. variabilis Syngen 2-3, and M. conductrix Pbi) were each 99.97% identical, respectively, to accession numbers FM205850 from Chlorella SAG 3.83 (3851/3852 sites), AB206550 from C. variabilis Syngen 2-3 (3947/3848 sites), and AB506070 from M. reisseri (6457/6459 sites). The near 100% sequence identity of our assembled sequences to the same species in GenBank provides strong verification of the organismal identity of our sampled species.For each assembly, plastid and mitochondrial contigs were detected by blastn searches with known organellar gene sequences from related Chlorellaceae species used as queries. The final consensus sequence for each species was constructed by aligning the mitochondrial and plastid contigs from the best draft assemblies (that maximized the average length of plastid or mitochondrial contigs). Circular genomes were confirmed by aligning the overlapping terminal regions of the contigs, which was further supported by read pairs that spanned both ends of the assembly. Using this strategy, a single completed circular chromosome was assembled for the plastome and mitogenome of each species.To evaluate the depth of coverage of the genome assemblies, read pairs were mapped onto respective consensus sequences with Bowtie 2.0. The resulting plots show an average mitochondrial depth of coverage of approximately 5000x for C. heliozoae, 4500x for C. variabilis Syngen, and 500x for M. conductrix (Figure ), and an average plastid depth of coverage of roughly 8000x for C. heliozoae, 3500x for C. variabilis Syngen, and 300x for M. conductrix (Figure ). The depths of coverage for the organellar genomes are substantially higher than would be expected for the nuclear genome from these small sequenced data sets, such that any organellar sequence copies in the nuclear genome will not contribute to the constructed sequences of the organellar genomes. In addition, there are no regions of substantially lower coverage in the organellar coverage plots, arguing against any erroneous incorporation of nuclear DNA into the organellar genome assemblies.Mitochondrial protein-coding genes were annotated by blast against the non-redundant database from the National Center for Biotechnology Information. The protein genes from the plastome were initially annotated by using DOGMA with a 60% cutoff and a blast e-value of 1e−5, followed by manual adjustment as necessary. Ribosomal RNAs were identified by blastn searches and transfer RNAs were identified with tRNAscan-SE. To identify potentially novel genes, blastn and blastx searches were also applied to all noncoding regions but no additional genes were identified. Homologs to mitochondrial and plastid introns in Chlorellaceae were identified by a blastn search with an e-value cutoff of 1e−20. Homologous introns were aligned in MEGA version 7 using the MUSCLE algorithm, and then uncorrected p-distances were calculated in MEGA using the distance function with deletions removed in a pairwise manner. [...] Both plastid and mitochondrial phylogenies were generated in this study. In addition to the three newly sequenced species, organellar genomes from 24–25 representative chlorophytes and six streptophytes (Table ) were collected from GenBank. Individual protein-coding genes were extracted and then manually checked for any misannotation issues. Exons from all 74 plastid genes and 32 mitochondrial genes that were present in more than half of the taxa were aligned by codons using MUSCLE version 3.8.31, and manually adjusted in BioEdit version 7.2.0 if necessary. Introns were not analyzed phylogenetically due to their highly sporadic distribution among species. Plastid and mitochondrial protein gene data sets were concatenated seperately by FASconCAT version 1.0, generating 124,056 and 38,679 aligned sites, respectively. The ambiguously aligned regions in the concatenated alignments were excluded using Gblocks version 0.91b with relaxed parameters (t = c, b2 = 16, b4 = 5, b5 = half), retaining 49,863 sites (40%) of the original plastid alignment and 19,965 sites (51%) of the original mitochondrial alignment. The nucleotide substitution model of best fit was determined to be the GTR + G + I model by jModelTest 2.1.10. Phylogenetic analyses were inferred from plastid and mitochondrial data sets using the Maximum Likelihood (ML) approach in PhyML version 3.0. ML trees were estimated with the GTR + G + I model and confidence of branching was estimated by bootstrap analyses with 100 replicates. […]

Pipeline specifications