Computational protocol: Microbial Community Profiling of Human Saliva Using Shotgun Metagenomic Sequencing

[…] DNA samples for metagenomics were prepared for 150 bp and 100 bp single-end sequencing using the Illumina GAIIx and HiSeq 2000 instrument (Illumina, San Diego, CA), respectively. Numerically coded aliquots of approximately 0.5–1 µg DNA per sample were used to create sequencing libraries. First, genomic DNA was fragmented using a Covaris™ S220 Sonicator (Covaris, Inc., Woburn, MA) to approximately 300 base pairs (bp). Fragmented DNA was used to synthesize indexed sequencing libraries using the TruSeq DNA Sample Prep Kit V2 (Illumina, Inc., San Diego, CA), according to manufacturer’s recommended protocol. Cluster generation was performed on the cBOT using the TruSeq PE Cluster Kit v3 – cBot – HS (Illumina). Libraries were sequenced with an Illumina HiSeq 2000 at Nationwide Children’s Hospital (NCH) Biomedical Genomics Core (Columbus, Ohio) using the TruSeq SBS Kit v3 reagents (Illumina) for paired end sequencing with read lengths of 100 base pairs (bps) (200 cycles) and at CosmosID with an Illumina GAIIx for 150 base pairs (bps) single read using the TruSeq SBS Kit v5 reagents (Illumina). Primary analysis (image analysis and basecalling) were performed using HiSeq Control Software (HCS) version and Real Time Analysis (RTA) version 1.13.48. Secondary Analysis (demultiplexing) was performed using Illumina CASAVA Software v1.6 Post processing of GAIIx reads was performed with RTA/SCS v1.9.35.0 and CASAVA 1.8.0 software. High throughput sequencing reads were quality filtered using the fastq_quality_filter program provided with the FASTX-Toolkit ( (v. 0.0.13). Only those reads with a quality score ≥17 for at least 80% of the read length (i.e., probability of correct base call ∼98%) were retained. Ion Torrent (Life Technologies, NY) sequencing was also performed using amplicons specific to the V4 region of the 16S rRNA gene. Sequence reads are available under NCBI BioProject ID PRJNA231652. [...] Sequence data were compared to the NCBI RefSeq database (v. May 19, 2012), but restricted to microbial gis, the NCBI 16S database (v. October 30, 2012), using BLASTn (top hit only) (v. 2.2.25, National Library of Medicine, Bethesda, MD). provides details of analyses carried out for each sample with data bases used. Resulting BLASTn hits were filtered to retain only those hits with percent identity ≥97%. An additional filter was applied to the BLASTn hit report to reduce false positives (i.e., reads whose corresponding taxonomic identifier (taxid) appeared ≤0.01% (1∶1000)). This was accomplished using a custom script. The Krona (v. 2.2) program,, was also used within a custom script, to obtain a list of organisms identified with read counts associated with each taxon. Krona program was used to provide interactive visualization of identified bacterial species. […]

Pipeline specifications

Software tools HSC, BaseSpace, FASTX-Toolkit, BLASTN, Krona
Databases CosmosID
Applications Phylogenetics, Metagenomic sequencing analysis
Organisms Homo sapiens, Neisseria meningitidis, Streptococcus pneumoniae
Diseases Bacterial Infections, Meningitis, Haemophilus