Computational protocol: A set of powerful negative selection systems for unmodified Enterobacteriaceae

Similar protocols

Protocol publication

[…] Type II toxin genes present in MG1655 were taken from () and from the NCBI RefSeq annotation of the MG1655 genome (NC_000913). Finished E. coli genome sequences were downloaded from NCBI RefSeq (Supplementary Figure S2). Each non-K12 genome was used as the database to search for each toxin gene using the blastn (using the toxin gene sequences) and tblastn (using the toxin protein sequences) programs (BLAST 2.2.28+) with default parameters. Examination of the blastn and tblastn results showed that a cutoff of a blastn alignment over 80% of the toxin gene length captured all genes with simultaneously >88% DNA identity and >84% protein identity over at least 80% of the toxin gene (or protein) sequence. This 80% blastn alignment cutoff also included four additional gene/genome combinations which had potential frameshift mutations (yafQ and yafO in SMS_3_5; cbtA in O55_H7 RM12579 and P12b) but for which the entire gene sequence was otherwise present. We thus used 80% alignment over the toxin gene length by blastn as the cutoff to call toxins as present or absent in a given genome. […]

Pipeline specifications

Software tools BLASTN, TBLASTN
Application Amino acid sequence alignment
Organisms Escherichia coli
Diseases Salmonella Infections, Skin Diseases, Bacterial