Computational protocol: Route of infection alters virulence of neonatal septicemia Escherichia coli clinical isolates

Similar protocols

Protocol publication

[…] To identify and compare the gene content of SCB34 and RS218, we first performed whole-genome sequencing of these two isolates on an Illumina MiSeq using a 250-bp paired-end library. The paired end reads were assembled de novo using the A5 assembly pipeline and annotation was performed using RAST or the National Center for Biotechnology Information (NCBI) Prokaryotic Genomes Annotation Pipeline, respectively [, ]. We then compiled a database of SCB34, RS218 and the annotated genomes from 33 phylogenetically diverse strains representative of all E. coli strains deposited in GenBank. Clusters of putative orthologous proteins were generated for all strains examined using CD-hit []. The sequence identity threshold utilized was 80% across 80% of the total protein length while all other parameters remained at the default values on CD-hit. From the CD-hit output, a database was generated that contained 22,084 clusters including between 1 and 35 genomes per cluster. Using python scripts, the orthologous protein cluster results from CD-hit were organized into a genome versus protein cluster tables in which the presence or absence of an ortholog in a given genome is identified with either a 1 or 0, respectively. Heatmaps were created from comparison tables using the gplots package in R (version 2.17.0 [http://CRAN.R-project.org/package=gplots]), employing hierarchical clustering to compare rows and columns and to construct the dendrograms. The list of the 22,438 clusters that were included in the CD-hit output, and their respective accession numbers are detailed in .Cluster sequences identified with CD-hit were manually curated by searching the NCBI Microbial Genomes database using the Basic Local Assignment Search Tool (BLAST) and querying all representative genomes, optimizing for highly similar sequences (Megablast). […]

Pipeline specifications

Software tools CD-HIT, gplots, BLASTN
Application WGS analysis
Organisms Escherichia coli, Homo sapiens, Mus musculus