Computational protocol: Uncovering Listeria monocytogenes hypervirulence by harnessing its biodiversity

[…] Genomic DNA of the 69 strains was extracted using the Promega Wizard genomic DNA purification kit. Genomes were sequenced using the Illumina HiSeq 2000 system with the 2×100 nucleotides paired-end strategy. Quality trimming of reads and adapter clipping were performed using AlienTrimmer. De novo assembly was performed on the final set of reads using CLCbio assembler (see section) with a minimum contig size of 500 nt. Mauve Contig Mover program was used to re-order contigs using completely sequenced genomes as references: F2365 for lineage I and EGDe for lineage II. Genome sequences were submitted to the MicroScope/MaGe platform (Genoscope, Evry, France) for gene prediction and assignment of gene product functions. Missing genes from the 35 genomes from GenBank (see section, last accessed in February 2013) were added using the MicroScope/MaGe platform in order to homogenize gene definition compared to the newly sequenced genomes. [...] Some of the genomes are very close whereas others are very distant. We therefore checked for phylogenetic inertia. The measurements of the phylogenetic inertia of the clinical frequency of clones were performed on the phylogeny of the fused lineages and of separated lineages by using the ‘phylosig’ tool, computed in R and implemented in the ‘phytools’ package using the lambda method. Phylogenetic inertia was high for the complete dataset (Lineage I + II, Pagel’s lambda = 0.9999, p = 0.0005).To identify gene families that are most associated with clones frequently involved in clinical infections, comparative analysis of the presence and absence patterns of gene families among clones according to the log10 of their clinical frequencies was performed taking into account their phylogenetic relationships. This was performed using generalized estimating equations (GEE) computed in R and implemented in the ‘ape’ package. The estimates of the regression parameters from GEE were calculated for each gene family, reflecting their association with clones of high clinical frequency. The selection of candidate genes for further experimental analysis was made by identifying, among gene families with high GEE estimates, those that were specific of clones of interest, and by taking into account functional annotations of these gene families that we combined with data from the literature. […]

Pipeline specifications

Software tools AlienTrimmer, CLC Assembly Cell, Mauve, Phytools, APE
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Listeria monocytogenes, Mus musculus, Homo sapiens, Bacteria
Diseases Listeriosis