Computational protocol: Predominant and Substoichiometric Isomers of the Plastid Genome Coexist within Juniperus Plants and Have Shifted Multiple Times during Cupressophyte Evolution

Similar protocols

Protocol publication

[…] The C. japonica plastid genome (GenBank Acc. No. NC_010548) was used as a reference to order the J. bermudiana sequenced genome fragments in Geneious version R6-1 (http://www.geneious.com/, last accessed March 11, 2014). All potential rearrangements found in comparison with Cryptomeria were confirmed by additional Sanger sequencing. The other three plastid genomes were assembled by running Velvet version 1.2.03 () using different pairwise combinations of Kmer values (51, 61, 71, 81, and 91) and expected coverage values (50, 100, 200, 500, 1000, and 2000). Scaffolding was turned off and the minimum coverage value was set to 10% of expected coverage. For each species, a single full-length contig was recovered in at least three independent runs, and the alignment consensus of three independent runs was taken as the final consensus sequence. To validate the genome assemblies, Illumina reads were mapped onto the consensus sequences with Bowtie version 2.0.0 beta 5 () as described previously (). Discrepancies were corrected using the sequence present in the majority of mapped read sequences.Protein-coding, ribosomal RNA, and transfer RNA genes in the four juniper plastomes were initially annotated by use of the DOGMA webserver (). DOGMA annotations were manually checked by blast searches with orthologous sequences from other Cupressaceae plastomes, and, in some cases, by sequence alignment using MUSCLE version 3.8.31 (). [...] To compare plastome organization within cupressophytes, genomes from 8 representative species were aligned using Mauve version 2.3.1 (). For this analysis, the start point of each genome was arbitrarily set as the ycf2 start codon.For Southern blotting, approximately 1 µg of J. monosperma, J. virginiana, and J. scopulorum DNA was digested with restriction enzymes EcoRI and HindIII, separated on a 0.5% agarose gel, and transferred to nylon membrane following established procedures (). Approximately 800 ng of PCR-derived probes were labeled with Digoxigenin (DIG) using the DIG High Prime DNA Labeling and Detection Starter Kit II following the manufacturer’s protocol (Roche, Mannheim, Germany). The membrane was prehybridized in ULTRAhyb hybridization solution (Life Technologies, Carlsbad, CA) for 4 h at 42 °C and then hybridized in ULTRAhyb solution containing the DIG-labeled probe overnight at 42 °C. The membrane was washed twice in 2× saline-sodium citrate (SSC) + 0.1% sodium dodecyl sulphate (SDS) for 5 min at room temperature and then twice in 0.5× SSC + 0.1% SDS for 15 min at 65 °C. Hybridized probes were detected with chemiluminescent substrate (CSPD) ready-to-use according to the DIG High Prime DNA Labeling and Detection Starter Kit II protocol. Subsequently, the membrane was exposed to a photo film for 10 min prior to development.Primers for the variable cycle PCR analysis were designed in genes flanking the Juniperus IR: rps4 (5′-CCTGGTAAAGTTTTGABACG-3′), psbK (5′-CAAATGAAAAGCGGCATCG-3′), chlB (5′-GTTCCAATATGAGCAGGACCAG-3′), and trnL-UAA (5′-GTTTCCATACCAAGGCTC-3′). PCR was performed with a C1000 thermal cycler (Bio-Rad) using the following primer combinations (rps4 + chlB, rps4 + trnL-UAA, psbK+chlB, psbK+trnL-UAA) and GoTaq Flexi DNA Polymerase with supplied reagents (Promega). Each reaction was 10 µl in volume and included 20 ng DNA. Reactions were amplified for 5, 10, 15, 20, 25, 30, or 35 cycles of denaturation (95 °C for 30 s), annealing (55 °C for 1 min), and elongation (72 °C for 2 min). All reactions also included an initial denaturation step (95 °C for 2 min) and a final elongation step (72 °C for 5 min).To quantify the relative frequency of the two isomeric genomic forms, Illumina paired-end reads were mapped to the genome using Bowtie 2 with default parameters. To avoid any mapping ambiguity, reads were required to unambiguously map to the nonrepetitive flanking sequences on either side of the repeats. This was possible because the average insert size of the sequencing libraries was approximately 800 bp, which easily spanned the approximately 250 bp repeats. A custom Perl script was used to count repeat-spanning read pairs, enabling us to quantify the frequency of the repeat in each possible genomic arrangement. Isomer frequencies were calculated by dividing the number of read pairs that support the alternative conformation by the total number of read pairs that support either conformation. To ensure that the results of the read-pair mapping analysis were not the result of cross-contamination of the Illumina data sets, raw Illumina sequence reads were aligned to three genomic regions that exhibited variability among the Juniperus species. The variable genomic regions were identified by manual inspection of a genomic alignment generated by MAFFT () with default parameters. All Illumina sequence reads that could be mapped to these genomic regions were extracted and then aligned to the variable regions by MAFFT with default parameters. [...] Plastomes from 12 cupressophyte and 9 Pinaceae species (supplementary table S1, Supplementary Material online) were downloaded from GenBank or generated in this study. To ensure annotation consistency among genomes, we performed an all-against-all BlastN search of all protein-coding genes from all species to identify missing genes or genes with incorrect start or stop codon annotations. Genomes with potentially missing or misannotated genes were manually checked, and their annotations were corrected if the gene could be identified or reannotated to improve consistency among species. For petL of Picea morrisonicola and ndhB of Cryptomeria, T. flousiana and Taxus, we used an upstream start codon to improve sequence similarity to orthologous genes. For petD of Keteleeria, we used an upstream stop codon. We identified numerous unannotated genes in Pinus thunbergii (ccsA, cemA, petL, petN, psbZ, ycf1, ycf2, ycf3, and ycf4), Podocarpus (psaI, psaJ, psaM, rpl20, rpl33, and rps18), Pseudotsuga (psbA), and Taxus (psbZ, rps2, and rps12).All 83 cupressophyte protein-coding genes were extracted from the corrected annotations and then individually aligned with MUSCLE version 3.8.31 () using default settings. Alignments were filtered using Gblocks version 0.91b () in DNA mode with relaxed parameters (b4=5 b5=h). Filtered alignments were concatenated in SequenceMatrix version 1.7.8 (), producing a final alignment of 68,497 bp. Maximum likelihood phylogenetic trees were constructed with the GTR+G substitution model in RAxML version 7.2.8 (). Tree robustness was assessed by nonparametric bootstrapping with 1,000 replicates. […]

Pipeline specifications