Computational protocol: Invasive Group A Streptococcus Infection among Children, Rural Kenya

Similar protocols

Protocol publication

[…] Blood cultures were undertaken by using the BACTEC Peds Plus system (Becton Dickinson, Franklin Lakes, NJ, USA) according to the manufacturers’ instructions. Positive broth cultures and CSF, urine, and surface swab samples were subcultured on 5% horse blood agar and chocolate agar. GAS isolates were identified by β-hemolysis, followed by Gram staining and catalase testing, and then grouped by latex bead agglutination. Penicillin susceptibility was tested by disk diffusion ( Laboratory procedures were subject to internal quality control and external quality control by the UK National External Quality Assessment Service.GAS isolates were subcultured on 5% horse blood agar from archived bacterial isolates (stored at −80°C) and transported to the Wellcome Trust Sanger Institute, Cambridge, UK. DNA was extracted by a QIAxtractor (QIAGEN, Valencia, CA, USA), and DNA quality and quantity were documented by using NanoDrop (Thermo Scientific, Waltham, MA, USA) and Qubit (Life Technologies, Carlsbad, CA, USA) techniques. Whole-genome sequences were determined from Illumina 96-plex libraries by using the HiSeq2000 sequencing platform (Illumina, San Diego, CA, USA) to generate tagged 75-bp paired-end reads. To obtain the overall population structure of the sequenced genomes, we mapped individual Illumina read pairs to the MGAS5005 (emm1) reference genome () by using SMALT version 0.5.8 ( The average coverage of the resulting whole-genome alignment was 190×. The minimum base-call quality for identifying a single nucleotide polymorphism (SNP) was set at 50, and the minimum mapping quality for SNP calling was set at 30. SNPs called in known MGAS5005 prophage regions and repeat regions were excluded from analyses. The final genome alignment was 1,629,062 bp and comprised 125,233 SNPs. To examine the genomic relationships between the sequenced genomes, we generated a maximum-likelihood tree from the SNP alignment by using FastTree (). Draft genome assemblies were compiled by using an iterative sequence assembly process as defined previously (). An initial quality control screen of the short-read sequences to identify mixed isolates and low-quality sequences was determined by examining genome assembly length and SNP heterogeneity. A total of 43 (11.6%) sequences had an assembly length of >2 mega basepairs and were excluded from phylogenetic analyses because of possible contamination. The emm type and multilocus sequence type (MLST) were obtained from in-house BLAST analysis of draft genome assemblies and compared with those in centralized databases (, New emm and MLST alleles were assigned by database curators. Allocation of emm cluster was derived as previously described (). Heterogeneity observed within the typing schemes was investigated by using maximum-likelihood associations in whole-genome sequence data. […]

Pipeline specifications

Software tools SMALT, FastTree
Applications Phylogenetics, WGS analysis, Nucleotide sequence alignment
Organisms Homo sapiens
Diseases Neoplasm Metastasis, Pneumonia, Bacteremia