Computational protocol: Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life

Similar protocols

Protocol publication

[…] To examine the effect of genetic drift on mutation-rate evolution, it is necessary to derive accurate estimates of the mutation rate and genetic diversity across phylogenetically diverse organisms. Whole-genome sequencing (WGS) has greatly improved our ability to estimate such parameters. Highly accurate measurements of ubs and uid can be obtained through WGS of mutation-accumulation (MA) lines, in which repeated single-organism bottlenecking minimizes the efficiency of selection, allowing for the accumulation of all but the most deleterious mutations (; ; ; , , ; ). Along with data from prior MA studies, this study contains MA data from four new MA experiments. For new bacterial MA species, ∼100 independent MA lines were initiated from a single founder colony. The new strains used were as follows: Agrobacterium tumefaciens str. C58, Staphylococcus epidermidis ATCC 12228, and Vibrio cholerae 2740-80.Depending on the speed of growth, a single colony from each MA line was isolated and transferred to a fresh plate every 1–3 d over the course of the experiment. The bottlenecking process ensures that mutations accumulate in an effectively neutral fashion (). After each transfer, the original plate was retained as a backup plate at 4°. If the destination plate was contaminated, or if a single colony could not be picked, a single colony was transferred from the last 4° backup plate.To estimate the generation times that occurred between each transfer, every 2 wk, an entire colony from five randomly selected MA lines was transferred to 1 × PBS saline buffer. These were vortexed, serially diluted, and replated. Cell density was calculated from viable cell counts in both the growth conditions used throughout the bottleneck process as well as growth conditions at 4°. The total number of generations for each MA line was calculated by the average number of cell divisions per transfer multiplied by the total number of transfers. If backup plates were used, the average number of cell divisions at 4° was used in place of the average number of cell divisions per bottleneck at standard growth temperatures.The average number of cell divisions across the MA are as follows (Dataset S1): A. tumefaciens, 5819; Bacillus subtilis, 5078 (); E. coli, 4246 (); Mesoplasma florum, 2351 (); S. epidermidis, 7170, and V. cholerae, 6453. The average number of generations used for reanalysis of the C. elegans MA study was 250 () (Dataset S2).DNA extraction of MA lines was done using the wizard DNA extraction kit (Promega) or lysis media (CTAB or SDS) followed by phenol/chloroform extractions to Illumina library standards. Then, 101-bp paired-end Illumina (Illumina Hi-Seq platform) sequencing was applied to randomly selected MA lines of A. tumefaciens, S. epidermidis, and V. cholerae. Each MA line was sequenced to a coverage depth of ∼100 ×, with an average library fragment size (distance between paired-end reads) of ∼175 bp. The paired-end reads for each MA line were individually mapped against the reference genome (assembly and annotation available from the National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov) using two separate alignment algorithms: BWA v0.7.4 () and NOVOALIGN v2.08.02 (available at www.novocraft.com). The resulting pileup files were converted to SAM format using SAMTOOLS v0.1.18 (). Using in-house perl scripts, the alignment information was further parsed to generate forward and reverse mapping information at each site, resulting in a configuration of eight numbers for each line (A, a, C, c, G, g, T, and t), corresponding to the number of reads mapped at each genomic position in the reference sequence. A separate file was also generated to display sites that had indel calls from the two alignment algorithms. Mutation calling was performed using a consensus method (; ; ; ; , , ).A random subset of base-substitutions mutations called using these methods have been previously validated in E. coli and B. subtilis MA lines using fluorescent sequencing technology at the Indiana Molecular Biology Institute at Indiana University (; ) (Dataset S3).To verify indel mutations, we designed 38 primer sets to PCR amplify 300–500 bp regions surrounding the putative indel mutation in the B. subtilis MA lines (Dataset S4). All 29/29 short indels (< 10 bp) were directly confirmed using standard fluorescent sequencing technology. Two out of nine large indels (> 10 bp) were confirmed through sizing of the PCR product on gel electrophoresis. The remaining seven large indels did not amplify. For all cases, the indel was also confirmed to be absent in one other line without the mutation.To calculate the base-substitution mutation rate per cell division for each line, we used the following equation:ubs = mnT,where ubs is the base-substitution mutation rate (per nucleotide site per generation), m is the number of observed base substitutions, n is the number of nucleotide sites analyzed, and T is the number of generations that occurred in the mutation-accumulation study. The SE for an individual line is calculated using (, ):SEx¯ = ubsnT.The total SE of base-substitution mutation rate is given by the SD of the mutation rates across all lines (s) divided by the square root of the number of lines analyzed (N).SEpooled = sNThe same calculation was used to calculate indel mutation rate, with ubs replaced with uid. […]

Pipeline specifications

Software tools BWA, NovoAlign, SAMtools
Application WGS analysis
Organisms Bacteria
Diseases Genetic Diseases, Inborn