Computational protocol: Complete genome of Staphylococcus aureus Tager 104 provides evidence of its relation to modern systemic hospital-acquired strains

Similar protocols

Protocol publication

[…] Genomic DNA was extracted from Tager 104 using E.Z.N.A. Bacterial D.N.A. kit (Omega Bio-tek, Norcross, GA) and constructed into a bar-coded library using the Nextera DNA sample preparation kit (Illumina, San Diego, CA). Sequencing was performed using an Illumina MiSeq sequencer for 2 × 150 paired end reads and trimmed sequence reads were assembled de novo using CLC Bio v. 4.6.1, as described previously []. To scaffold these contigs, a sub-library of Tager 104 was constructed for PacBio SMRT sequencing. Two sequencing reactions were performed, and CLC bio contigs were scaffolded using Celera Assembler pipeline on the SMRT analysis 1.3 suite [].To overcome the innate difficulty in genome closure in this initial construction, PacBio reads were instead assembled de novo using SMRT Analysis v. 2.0 Hierarchical Genome Assembly Process (HGAP) algorithm [], which produced 8 contigs. To close the genome, two Lucigen NxSeq 20 kb mate-pair libraries were constructed and sequenced on an Illumina HiSeq system. PacBio HGAP scaffolds and Lucigen NxSeq paired-end reads from the mate-pair library were provided to SSAKE-based scaffolding of Pre-Assembled Contigs after Extension (SSPACE) [] for de novo assembly to create the closed draft genome. Gap regions in this genome were closed using a combination of the GapFiller algorithm (as part of the SSPACE suite) and Basic Local Alignment Search Tool (BLAST) search against the initial CLC contigs for those which bridge gap regions. This complete, circular genome was submitted to the Rapid Annotation using Subsystem Technology (RAST) server [–].To confirm the construction of Tager 104 using an independent method, new libraries were constructed using the Nextera DNA kit and sequenced using Illumina MiSeq 2 × 250 reactions. These results were combined using the St. Petersburg genome assembler (SPAdes) algorithm [] for genome closure. [...] To test the contribution of repeats to the shortcomings in genome assembly, long repeats (>500 bp) were identified using Nucmer mapping of the Tager 104 genome to itself [] and selecting for regions with unique locations and proper size. In addition, interspersed repeats and RNA sequences were identified using the RepeatMasker algorithm (www.repeatmasker.org). The coordinates of unique repeats were recorded and provided to Circos version 0.64 (www.circos.ca).To determine the contribution of Illumina MiSeq contigs (constructed using CLC), PacBio RS sequencing reads, and scaffolds constructed from the combination of the two, results from each assembly were mapped to the Tager 104 genome using Nucmer. The locations of each unique mapping were provided to Circos for visualization. […]

Pipeline specifications

Software tools Celera assembler, SMRT-Analysis, HGAP, SSAKE, SSPACE, GapFiller, BLASTN, RAST, SPAdes, MUMmer, RepeatMasker, Circos
Applications De novo sequencing analysis, Nucleotide sequence alignment, Genome data visualization
Organisms Staphylococcus aureus, Homo sapiens, Ilex paraguariensis