Computational protocol: The lytic Myoviridae of Enterobacteriaceae form tight recombining assemblages separated by discontinuities in genome average nucleotide identity and lateral gene flow

Similar protocols

Protocol publication

[…] Lytic Myoviridae bacteriophage genomes were obtained in 2016 from NCBI and EBI (accession numbers in Table S1, available in the online version of this article). ANI (blast algorithm ANIb, 500 bp fragment length) was computed using Jspecies []. For every genome pair Jspecies separated the query genome into 500 bp non-overlapping fragments, performed blast searches against target genomes for all of them and calculated average identity. ANIb coverage was obtained by a simple shell script using Jspecies blast results and validated against Jspecies coverage results; these are interactive, but hard to tabulate for all genome pairs. The pairwise homoplasy index [] was used as the intragene recombination test and was run on PGAP-defined [] and mafft-aligned [] core genes of ANI-delineated bacteriophage groups. PGAP takes as input a complete set of predicted proteins and genes of all the genomes in a sample and performs an all-versus-all blastp search. The genes found in all genomes [using the gene family rule of 50 % amino acid identity (AAI) and at least 50 % coverage] make up the core genome. The neigbour-joining dendrogram of T4-like bacteriophage gene presence–absences was produced by PGAP. The phylogeny reconstruction based on (PGAP-defined) conserved genes of related ANI groups was done by PhyML as implemented in SeaView []. The phylogenetic congruence of core genes in ANI-delineated groups and of shared genes when comparing different groups was estimated using the congruence among distance matrices (CADM) test of the R package ape []. Multiple gene handling to produce gene distance matrices (model TN93) and reconstruction of phylogenetic trees for all the core or shared genes simultaneously was done using the R package apex []. The consensus net was reconstructed using neighbour-joining gene phylogenetic trees with SplitsTree4 []. The clonal frame in ANI groups was estimated using Gubbins [] run on the core genome produced by Roary [], a novel pipeline that is drastically faster than PGAP for core genome reconstruction due to filtering out almost identical sequences before the blastp stage. The gff files for Roary were produced from bacteriophage nucleotide sequences using Prokka [], a customizable local annotation pipeline. […]

Pipeline specifications

Software tools JSpeciesWS, PGAP, MAFFT, BLASTP, PhyML, SeaView, APE, SplitsTree, Gubbins, Roary, Prokka
Applications Genome annotation, Phylogenetics, Nucleotide sequence alignment
Organisms Escherichia coli, Filamentous phage, Viruses, Bacteria
Chemicals Nucleotides