Computational protocol: Chloroplast Genome Variation in Upland and Lowland Switchgrass

Similar protocols

Protocol publication

[…] Initial annotation of the Panicum virgatum L. cp genome was performed using DOGMA (Dual Organellar GenoMe Annotator, http://dogma.ccbb.utexas.edu/) . DOGMA uses a FASTA-formatted input file to identify putative protein-coding genes by performing BLASTX searches against a custom database of published cp genomes. The input nucleotide sequence was queried in all six reading frames against amino acid sequences for all genes in the DOGMA database. Putative start and stop codons for each protein-coding gene as well as intron and exon boundaries for intron-containing genes were then checked manually. DOGMA identified both tRNAs and rRNAs through BLASTN searches against cp nucleotide databases and these were verified by the user. Manual annotation was performed using Artemis . [...] Mononucleotide microsatellite markers were predicted using MISA . A goodness of fit test was performed for mononucleotide repeats classified by region or by coding capacity based on the expectation of a random distribution proportional to the relative sizes of each region. The inverted repeat region was only counted once. [...] Whole genome comparisons were performed between Lin1 and Lin2 with MUMmer . Primers were designed flanking insertions to score length polymorphisms between Kanlow and Summer or to score specific SNP variants using allele-specific flanking primers. PCR products are separated at 80V (constant voltage) in a 2% (w/v) agarose, TAE gel.A total of 101.3 million Illumina GAIIx 56-bp reads were produced from cDNA libraries of P. virgatum cv. ‘Kanlow’ crown and rhizome tissue prior to a killing frost. Another 106.5 million Illumina GAII 36-bp reads were downloaded from the National Center for Biotechnology Information (NCBI) sequence read archive that were annotated from a variety of upland ecotypes. These reads were aligned to the Lin1 cp reference sequence using Burrows-Wheeler Aligner and Samtools for SNP evaluation. Alignment and reporting conditions were set to allow a maximum of 1 mismatch per read.No specific permits were required for the described field studies. [...] A set of 61 protein-coding genes included in the analysis of several other cp genomes , , were extracted from the switchgrass cp genomes using DOGMA . The same 61 protein-coding genes were extracted from 13 other sequenced genomes (see ) and amino acid sequences were aligned using MUSCLE . After manual adjustments, nucleotide sequences of these genes were aligned by constraining them to the aligned amino acids. Phylogenetic analyses using maximum parsimony (MP) and maximum likelihood (ML) were performed with MEGA5 . All gap regions were excluded during analysis to avoid alignment ambiguities. The MP tree was obtained using the Close-Neighbor-Interchange algorithm with search level 1 in which the initial trees were obtained with the random addition of sequences (10 replicates). Non-parametric bootstrap analyses were performed with 500 replicates. Maximum Likelihood analysis was conducted based on the Tamura-Nei model using a heuristic search for initial trees . Bootstrapping was performed as for MP with 500 replicates. All three codon positions were included for both MP and ML analyses. […]

Pipeline specifications

Software tools DOGMA, BLASTX, BLASTN, MISA, MUMmer, BWA, SAMtools, MUSCLE, MEGA
Databases SRA
Applications Genome annotation, Phylogenetics, WGS analysis, Nucleotide sequence alignment
Organisms Panicum virgatum, Oryza sativa