Computational protocol: Genomic evolution of Staphylococcus aureus isolates colonizing the nares and progressing to bacteremia

Similar protocols

Protocol publication

[…] To identify SNPs between pairs of isolates, we used Bowtie2 (v2.2.6) software [] to map paired-end reads to the S. aureus reference sequence USA300_FPR3757 (ATCC-1556) []. The USA300_FPR3757 contains three annotated plasmids (pUSA01, pUSA02, and pUSA03). Duplicates were marked and excluded from further analysis using Picard2 RemoveDuplicates tool (http://broadinstitute.github.io/picard). SNPs were identified with Freebayes v1.0.2.29 [] using a list (-L) to compare all genomes from a single subject compared to the reference sequence (USA300_FPR3757) and filtered using SnpSift [] for the following qualities (MQM > 20 and DP > 40). For each subject, the SnpSift call for Reference (isRef) and Variant (isVar) was completed in a stepwise manner to determine how mutations accumulated over the subject’s history. SNPs were annotated from the USA300_FPR3757 genome and classified into functional categories using TIGRFAM [] and Aureowiki [].The A5 de novo assembly pipeline [] was used to construct draft genome assemblies (contigs) from the raw paired-end sequence reads (see GenBank accession numbers PHUU00000000.1- PHWY00000000.1 for published draft genome assemblies). Virulence factors were identified among each set of contigs using the S. aureus VirulenceFinder version 1.5 database []. Multi-locus sequencing type (MLST) assignments were made by searching each set of contigs for sequences of the housekeeping genes (arcC, aroE, glpF, gmk, pta, tpi, yqiL) and comparing these to a database of S. aureus types for MLST []. Mykrobe predictor was used to predict the antibiotic resistant profiles from the A5 pipeline results [].To infer the evolutionary relationships among the S. aureus isolates, sequence reads were mapped to a USA300 reference genome (USA300_FPR3757). For each isolate, informative base calls were made only at positions with coverage of >10 reads and >90% concordance in base calls across all aligned reads; positions not meeting these criteria were called as “N” and excluded from subsequent analyses. Read-mapping created a genome-wide multiple sequence alignment (>2.8 million positions) of all included genomic sequences [], from which an approximately-maximum-likelihood phylogenetic tree was generated using FastTree v2.1.10 [] and displayed with Dendroscope v3.5.9 []. Because no full-length, polished genomic sequences of USA100 were available, we corroborated results by generating trees using as references concatenated contigs USA100 isolate that was sequenced in our laboratory, along with two other full-length reference sequences, NC002953 MSSA (MSSA 476), and NC002951 COL (Staphylococcus aureus subsp. aureus COL). […]

Pipeline specifications

Software tools Bowtie2, Picard, FreeBayes, SnpSift, Mykrobe predictor, FastTree, Dendroscope
Application Phylogenetics
Organisms Staphylococcus aureus, Homo sapiens