Computational protocol: Locus of Adhesion and Autoaggregation (LAA), a pathogenicity island present in emerging Shiga Toxin–producing Escherichia coli strains

Similar protocols

Protocol publication

[…] All genome sequences analyzed were downloaded from GenBank at the National Center for Biotechnology Information (NCBI - on 20 September 2016. Accession number and the source of the sequences are listed in Supplementary Table . Contigs of draft genomes were ordered and aligned against the complete genome of E. coli K-12 substr. MG1655 using progressiveMauve. Then, contigs of each strain were concatenated into one contiguous sequence and the genetic context of the hes gene was analyzed using several bioinformatic tools. For instance, the DR sequences, IS elements and tRNA loci were identified using REPuter, ISfinder and tRNAscan-SE, respectively. Besides, the ORFs and the G + C content were determined by analyzing genomic sequences using Unipro GENE and the Geneious software package (v10.0.9; Biomatters Ltd). DNA with PAI features were used to performed BLASTn searches against the Pathogenicity Island Database v2.0. Also, a local BLASTn search was performed in the Geneious software package to determine the distribution of LAA modules and their insertion sites in the genomes analyzed. [...] (1) Whole genome SNP analysis: Genome sequences, both draft and complete, were uploaded to the CSI Phylogeny 1.4 server, which identifies SNPs from whole genome sequencing data, filters and validates the SNP position, and then infers phylogeny based on concatenated SNP profiles. This analysis was performed using the default input parameters and E. coli K-12 MG1655 as the reference genome. As a result, 167,167 SNPs were identified in 3,008,649 positions found in all analyzed genomes. The output file in Newick format was downloaded and used for visualization of the phylogenetic tree in FigTree v.1.4.2 ( In silico PCR was performed for the determination of phylogroup based on presence/absence of the genes chuA, yjaA, arpA, trpA and the segment TspE4.C2, as proposed by Clermont et al.. (2) Genetic relationships of the LAA pathogenicity island among LEE-negative STEC strains: A total of 42 genomic sequences of STEC strains carry the four modules of LAA were uploaded to the CSI Phylogeny 1.4 server and the SNPs identification was carried out with default input parameters and LAAB2F1 as the reference sequence. As a result, 677 SNPs were identified in 49,427 positions found in all sequences. Tree construction was performed as described above. [...] The presence, absence and variations in LAA-encoded genes was assessed by BLASTn searches performed in the Geneious software package with the LAAB2F1 as the reference sequence. By default, when coverage and/or identity of the genes was below to 60%, this was considered absence. Comparisons between genomes and complete LAA sequences were performed and visualized using progressiveMauve and EasyFig v2.1, respectively. A heat map showing the presence, absence and identity of LAA-encoded genes was drawn using the package gplots in R. […]

Pipeline specifications

Software tools Mauve, REPuter, tRNAscan-SE, Geneious, BLASTN, CSI Phylogeny, FigTree, Easyfig, gplots
Databases ISfinder PAIDB
Applications Genome annotation, Phylogenetics, Nucleotide sequence alignment, Genome data visualization
Organisms Escherichia coli, Homo sapiens
Diseases Hemolytic-Uremic Syndrome