Similar protocols

Protocol publication

[…] For all strains, except Cosmarium and Cylindrocystis, total cellular DNA was extracted as described () and A + T-rich organellar DNA was separated from nuclear DNA by CsCl-bisbenzimide isopycnic centrifugation (). Total cellular DNA from Cosmarium and Cylindrocystis was isolated using the EZNA HP Plant Mini Kit of Omega Bio-Tek (Norcross, GA, USA).For Illumina sequencing of the Closterium, Cosmarium, and Cylindrocystis chloroplast genomes, libraries of 700-bp fragments were constructed using the TrueSeq DNA Sample Prep Kit (Illumina, San Diego, CA, USA) and paired-end reads were generated on the Illumina HiSeq 2000 (100-bp reads) or the MiSeq (300-bp reads) sequencing platforms by the Innovation Centre of McGill University and Génome Québec and the “Plateforme d’Analyses Génomiques de l’Université Laval,” respectively. Reads were assembled using Ray v2.3.1 () and contigs were visualized, linked and edited using CONSED v22 (). Contigs of chloroplast origin were identified by BLAST searches against a local database of organelle genomes. Regions spanning gaps in cpDNA assemblies were amplified by polymerase chain reaction (PCR) with primers specific to the flanking sequences. Purified PCR products were sequenced using Sanger chemistry with the PRISM BigDye Terminator Ready Reaction Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) on ABI model 373 or 377 DNA sequencers (Applied Biosystems).For 454 sequencing of the Entransia, Netrium, Roya and Spirogyra chloroplast genomes, shotgun libraries (700-bp fragments) of A + T-rich DNA fractions were constructed using the GS-FLX Titanium Rapid Library Preparation Kit of Roche 454 Life Sciences (Branford, CT, USA). Library construction and 454 GS-FLX DNA Titanium pyrosequencing were carried out by the “Plateforme d’Analyses Génomiques de l’Université Laval.” Reads were assembled using Newbler v2.5 () with default parameters, and contigs were visualized, linked and edited using CONSED v22 (). Identification of cpDNA contigs and gap filling were performed as described above for Illumina sequence assemblies.For Sanger sequencing of the Klebsormidium and Coleochaete chloroplast genomes, random clone libraries were prepared from 1500 to 2000-bp fragments derived from A + T rich DNA fractions using the pSMART-HCKan (Lucigen Corporation, Middleton, WI, USA) plasmid. Positive clones were selected by hybridization of each plasmid library with the original DNA used for cloning. DNA templates were amplified using the Illustra TempliPhi Amplification Kit (GE Healthcare, Baie d’Urfé, Canada) and sequenced with the PRISM BigDye terminator cycle sequencing ready reaction kit (Applied Biosystems) on ABI model 373 or 377 DNA sequencers (Applied Biosystems), using SR2 and SL1 primers as well as oligonucleotides complementary to internal regions of the plasmid DNA inserts. The resulting sequences were edited and assembled using Sequencher v5.1 (Gene Codes Corporation, Ann Arbor, MI, USA). Genomic regions not represented in the sequence assemblies or plasmid clones were directly sequenced from PCR-amplified fragments using primers specific to the flanking contigs.Genes and ORFs were identified on the final assemblies using a custom-built suite of bioinformatics tools as described previously (). tRNA genes were localized using tRNAscan-SE v1.3.1 (). Intron boundaries were determined by modeling intron secondary structures (; ) and by comparing intron-containing genes with intronless homologs. Circular genome maps were drawn with OGDraw v1.2 (). Genome-scale sequence comparisons of the pairs of Roya and Klebsormidium species were carried out with LAST v7.1.4 (). For all compared genomes, G + C contents of a set of 88 protein-coding genes were determined at the three codon positions using DAMBE v5 ().To estimate the proportion of small repeated sequences, repeats ≥ 30 bp were retrieved using REPFIND of the REPuter v2.74 program () with the options -f -p -l -allmax and were then masked on the genome sequences using RepeatMasker running under the Crossmatch search engine. The G+C contents of the repeated and unique sequences were calculated from the outputs of RepeatMasker that were generated with the -xsmall option (under this option the repeat regions are returned in lower case and non-repetitive regions in capitals in the masked file). [...] The chloroplast genomes of 28 streptophyte taxa were used to generate the analyzed amino acid (PCG-AA) and nucleotide (PCG12) data sets. The latter were assembled from the following 88 protein-coding genes: accD, atpA, B, E, F, H, I, ccsA, cemA, chlB, I, L, N, clpP, ftsH, infA, ndhA, B, C, D, E, F, G, H, I, J, K, odpB, petA, B, D, G, L, N, psaA, B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 14, 16, 20, 21, 22, 23, 32, 33, 36, rpoA, B, C1, C2, rps2, 3, 4, 7, 8, 11, 12, 14, 15, 16, 18, 19, ycf1, 3, 4, 12, 62, 66.The PCG-AA data set was prepared as follows: the deduced amino acid sequences from the 88 individual genes were aligned using MUSCLE v3.7 (), the ambiguously aligned regions in each alignment were removed using TrimAl v1.3 () with the options block = 6, gt = 0.7, st = 0.005 and sw = 3, and the protein alignments were concatenated using Phyutility v2.2.6 (). Phylogenies were inferred from the PCG-AA data set using the maximum likelihood (ML) and Bayesian methods. ML analyses were carried out using RAxML v8.2.3 () and the GTR + Γ4 model of sequence evolution; in these analyses, the data set was partitioned by gene, with the model applied to each partition. Confidence of branch points was estimated by fast-bootstrap analysis (f = a) with 100 replicates. Bayesian analyses were performed with PhyloBayes v4.1 () using the site-heterogeneous CATGTR + Γ4 model (). Five independent chains were run for 2,000 cycles and consensus topologies were calculated from the saved trees using the BPCOMP program of PhyloBayes after a burn-in of 500 cycles. Under these conditions, the largest discrepancy observed across all bipartitions in the consensus topologies (maxdiff) was 0.0007, indicating that convergence between the chains was achieved.The PCG12 nucleotide data set (first and second codon positions) was prepared as follows. The multiple sequence alignment of each protein was converted into a codon alignment, the poorly aligned and divergent regions in each codon alignment were excluded using Gblocks v0.91b () with the -t = c, -b3 = 5, -b4 = 5 and -b5 = half options, and the individual gene alignments were concatenated using Phyutility v2.2.6 (). The third codon positions of the resulting PCG123 alignment were then excluded using Mesquite v3.04 () to produce the PCG12 data set. ML analysis of the PCG12 data set was carried out using RAxML v8.2.3 () and the GTR + Γ4 model of sequence evolution. This data set was partitioned into gene groups, with the model applied to each partition. Confidence of branch points was estimated by fast-bootstrap analysis (f = a) with 100 replicates.dN, dS and dN/dS trees were inferred from a tufA codon alignment prepared as described above using PAML v4.8a () and the F3X4 codon frequencies model implemented in codeml. Positive selection was tested across the tufA sequences using the PARRIS module implemented in Datamonkey (). [...] Syntenic regions in pairwise genome comparisons were identified using a custom-built program and the number of gene reversals between the compared genomes was estimated with GRIMM v2.01 (). The same custom-built program was employed to convert gene order to all possible pairs of signed genes (i.e., taking into account gene polarity); the gene pairs conserved in three or more genomes were visualized using Mesquite v3.04 (). Gains and/or losses of genomic characters (standard genes, introns and gene pairs) were mapped on the streptophyte tree topology inferred in this study using MacClade v4.08 () and the Dollo principle of parsimony.A ML tree based on gene adjacency was inferred using the phylogeny reconstruction option of the MLGO web server () and a gene order matrix containing all standard genes (including all copies of duplicated genes). Confidence of branch points was estimated by 1000 bootstrap replications. A gene reversal tree with the same topology as the MLGO tree was also computed; branch lengths were estimated using the -t option of MGR v2.03 () and a gene order matrix of the 89 genes shared by all compared genomes; because MGR cannot handle duplicated genes, only one copy of the IR and of each duplicated gene was included in this analysis. […]

Pipeline specifications

Software tools Consed, Newbler, Sequencher, tRNAscan-SE, OGDRAW, DAMBE, REPuter, RepeatMasker, MUSCLE, trimAl, phyutility, RAxML, PhyloBayes, Gblocks, Mesquite, PAML, Datamonkey, GRIMM, MacClade, MLGO, MGR
Applications Genome annotation, Phylogenetics, Population genetic analysis, Nucleotide sequence alignment, Genome data visualization
Organisms Entransia fimbriata, Coleochaete scutata, Cylindrocystis brebissonii, Netrium digitus, Roya obtusa, Spirogyra maxima, Cosmarium botrytis, Closterium baillyanum, Klebsormidium flaccidum, Chaetosphaeridium globosum