Computational protocol: Obscured phylogeny and possible recombinational dormancy in Escherichia coli

Similar protocols

Protocol publication

[…] We used a subset of E. coli genomes (strains K-12, CFT073, UTI89, O157 Sakai, and EDL933 [,]) at the outset of the project for segment selection purposes. Then, we identified the conserved backbone regions that were at least 25 kb in length, and uninterrupted by O-islands. Two regions that were 25 kb in length in two different quadrants of the chromosome were selected for further analysis: 1,084,426 - 1,109,426 (Segment 1) and 2,368,611 - 2,393,611 (Segment 2) (position numbers based on nucleotide sites in the O157 Sakai chromosome) []. For the purposes of this study, these genes met a functional definition of backbone, as chromosomal loci common to all sequenced E. coli at the time we needed to choose a data set for analysis. However, it is possible that a subset of these open reading frames might not be found in subsequently sequenced strains. We then performed long range PCR across three overlapping sections of each 25 kb segment in a set of pilot ECOR strains (Additional File , Table S1) to ensure that these segments were likely to be intact and uninterrupted across the species.Segments 1 and 2 were sequenced (from nucleotide positions 1,084,356 to 1,110,604 and 2,368,707 to 2,393,879, respectively) in eight ECOR strains (two each from groups A, B1, B2, and D) (Additional File , Table S1) based on uniform restriction patterns in these segments in these pilot strains. Orthologous sequences from 13 published E. coli strains (including four of the initial five-strain dataset) as well as E. albertii (outgroup) (Additional File , Table S1) were retrieved from the NCBI database using BLASTn [], then aligned to Segments 1 and 2 of the ECOR strains. We analyzed only the nucleotides of Segments 1 and 2 that were represented in all 21 strains by concatenating these common sequences into two respective contigs for each strain (Segment 1 = 23,237 bp, Segment 2 = 23,394 bp), and then aligning them using ClustalW []. Validation studies used Segments 3 (3,633,818 - 3,658,818) and 4 (4,754,067 - 4,779,067), and the same alignment techniques used for Segments 1 and 2. Primers were designed to amplify ~500 bp overlapping segments of the genome in Segments 1 and 2 in eight ECOR strains (Additional File , Table S1). DNA was prepared by phenol chloroform extraction and ethanol precipitation, and each amplicon was Sanger sequenced.Sequenced amplicons for each strain were assembled into contigs using the SeqMan Pro program (Lasergene v.3 DNASTAR software suite). Regions that failed to amplify and multi-nucleotide insertions or deletions were not included in the final concatenated assembly. Single nucleotide indels and SNPs occurring in only one strain were verified by visualizing the original trace data. The sequences from the amplicons that were successfully sequenced in every strain and for which there was orthologous sequence in the published genomes were concatenated using Lasergene's EditSeq program and aligned by ClustalW in Molecular Evolutionary Genetics Analysis (MEGA) software v.4.0 []. All analyzed sequences are provided in Table S3 (see Additional File ), as aligned by SeaView (version 4.2.11) []. We chose to use E. albertii as an outgroup in all analyses, because, unlike Salmonella, it is considered a member of the E. coli species, and has considerably more Segment 2 orthologous sequence E. coli than E. fergusonii and evolved less rapidly (thereby diminishing the risk of long branch attraction) []. The ClustalW alignment of all strains (except E. albertii) (see Additional File , Figure S2) was analyzed for evidence of sequence acquired by recombination using GENECONV [] with command-line parameter gscale = 1. Regions of sequence identified as being affected by recombination were replaced by "---". An α of 0.05 was considered statistically significant.We constructed phylogenetic models using Neighbor Joining (NJ), Minimum Evolution (ME) and Maximum Parsimony (MP) analyses in MEGA v.4.0 software []. Phylogenetic analysis was performed by using Kimura-2-parameter (for NJ and ME), and complete-deletion for all trees. Bootstrapping was performed with 1,000 replicates. Split Decomposition (SD) network analysis was performed using SplitsTree v.4.10 []. […]

Pipeline specifications

Software tools BLASTN, Clustal W, DNASTAR Molecular Biology Suite, MEGA, SeaView, MEGA-V, SplitsTree
Applications Phylogenetics, GWAS
Organisms Escherichia coli