Computational protocol: Genome dynamics and diversity of Shigella species, the etiologic agents of bacillary dysentery

[…] The whole genome sequence shotgun libraries for all strains were established as described previously (), and ABI3730 automated sequencers were used for sequence collection. For each genome, we generated over 48 000 paired-end shotgun reads with estimated 8- to 9-fold coverage. The initial genome assembly was processed by phred/phrap program with the Q20 criteria (). As there were large numbers of IS-elements present in each of the genomes, to avoid mis-assembly contigs obtained by phrap were split at each dubious IS locus and their relationships were rebuilt manually based on paired-end reads location information using Consed (). Approximately 4500–6000 sequencing reads were generated for primer-walking of large clones or for PCR amplicons during the finishing phase for each of the genomes. To verify the final assembly, we designed overlapping primer pairs covering the whole genome sequence using genomic DNA as template for PCR amplifications. The genome annotations were performed as described previously (), and GenomeComp was used for genomic comparison with default parameters (). Each pairwise comparison figure used in was exported from GenomeComp with a 1000 bp filter setting along with the scale setting of 3000 and 300 for chromosomes and virulence plasmids, respectively. The KEGG database was used for the metabolic pathways analysis (). […]

Pipeline specifications

Software tools Consed, GenomeComp
Databases KEGG
Application WGS analysis
Organisms Escherichia coli str. K-12 substr. MG1655, Homo sapiens, Escherichia coli
Diseases Dysentery, Bacillary, Encephalitis, Arbovirus