Similar protocols

Protocol publication

[…] b', and several isolates (Juru\xc3\xa1_18/11, Juru\xc3\xa1_20/10, Envira_10/1, and Envira_8/11) obtained during ETEC outbreaks that caused severe diarrheal illness in two small villages, Juru\xc3\xa1 and Envira, in the Amazonia region of Brazil in 1998 () (see in the supplemental material). A total of 208 new ETEC isolates were included in this study., Genomic DNA was isolated from bacterial stocks grown overnight in LB using the GenElute genomic kit (Sigma-Aldrich, St. Louis, MO). The genome sequence of each isolate was generated at the Institute for Genome Sciences, Genome Resource Center, on an Illumina HiSeq2000 instrument using paired-end libraries with 300-bp inserts. The draft genomes were assembled using Celera Assembler (). The final assemblies were filtered to contain contigs of \xe2\x89\xa5500\xc2\xa0bp. The average coverage of the genomes sequenced in this study was >200\xc3\x97. Information regarding the size of the assembled genomes, number of contigs, and GenBank numbers for each of the genomes sequenced in this study is available in in the supplemental material., The ETEC genomes sequenced in this study were compared with a diverse collection of E.\xc2\xa0coli and Shigella genomes (). Briefly, single nucleotide polymorphisms (SNPs) were detected relative to the completed genome sequence of the laboratory isolate E.\xc2\xa0coli K-12 W3110 with a direct mapping of sequence based on nucmer alignments (). SNPs present in all genomes analyzed were concatenated. A maximum-likelihood phylogeny with 100 bootstrap replicates was generated using RAxML v8.0.16 (), using the ASC_GTRGAMMA substitution model, and visualized using FigTree v1.3.1 (, The level of similarity of protein-encoding genes was compared across all 208 genomes in this study using a large-scale BLAST score ratio (LS-BSR) analysis (). Genes were predicted for each genome sequence using Prodigal () with default settings. Predicted genes from all genomes were then concatenated into a single file. The genes were clustered based on similarity with USEARCH (), using a nucleotide identity threshold of 90%. Following the clustering, a file was generated that contained a centroid sequence for each cluster. The consensus sequences were translated and compared to each genome using tBLASTn as described above. The maximum tBLASTn bit score value obtained for each cluster was used as the denominator to generate a ratio for the cluster compared to each genome., The presence or absence' […]

Pipeline specifications

Software tools Prodigal, USEARCH, TBLASTN