Computational protocol: 16S rRNA gene sequencing on a benchtop sequencer: accuracy for identification of clinically important bacteria

Similar protocols

Protocol publication

[…] To test the effect of data analysis method on accuracy, we compared the identifications made by three available algorithms using the same V1–V2 hypervariable region sequence data as follows:Ion Reporter metagenomics workflow: Unaligned binary data files (Binary Alignment Map, BAM) generated by the Ion Torrent PGM were uploaded to Ion Reporter (https://ionreporter.thermofisher.com/ir/) and analysed using default settings. Calls were made by assessing the most specific unique taxonomic level identified using the mapped sequences. For example, if two genera in the same family were identified using the sequence for a particular strain, the result was a family‐level call. blast‐based analysis: Unaligned BAM files were converted to FASTQ format and passed to cutadapt ver. 1.2.1 (https://cutadapt.readthedocs.org/en/stable/) for removal of primer sequences, quality trimming and length filtering. Because some primers used to generate the 16S rRNA gene amplicons contained redundant nucleotides (Table ), all possible primer sequences were provided to cutadapt when relevant. Primer sequence removal was performed using default cutadapt settings with quality trimming by Phred quality score set to 20, and length filtering set to remove any read whose length was less than 75% of the expected full‐length amplicon in question (e.g. V1–V2 reads less than 234 base pairs in length were removed, expected sizes for amplicons are shown in column 2 of Table ). Resulting FASTQ files were converted to FASTA format and used as input to blastn (ncbi‐blast‐2.2.29+; Camacho et al. ) utilizing a 16S Microbial database downloaded from the National Center Biotechnology Information (ftp://ftp.ncbi.nlm.nih.gov/blast/db). To be included in the results, a match had to have greater than 98% identity with the query and an e‐value less than 1e‐50. Results from the blastn algorithm were parsed to keep only the first best match based on bitscore. The number of matches to every organism detected within each sample were counted and used to determine the taxonomic level reached by comparing the most prevalent organism to the correct identity.Ribosomal Database Project (RDP): The FASTA files generated for blastn analysis above were analysed utilizing the SequenceMatch algorithm in the RDP Tools suite (https://github.com/rdpstaff/RDPTools) with the number of matches set to one for each sequence. A reference file was created for use by the SequenceMatch algorithm by downloading sequences from the RDP database using the Hierarchy Browser. Sequences were selected with the strain option set to ‘type’ and ‘nontype’, the source set to ‘isolates’, the size set to ‘>1200’, and the quality set to ‘good’. The resulting reference FASTA file was further edited to remove entries with nontaxonomic headers (e.g. ‘marine bacterium’, ‘arsenic‐oxidizing’). Results for each read analysed were annotated with the organism name, and the number of matches to each organism detected within each sample was counted and used to determine the taxonomic level reached by comparing the most prevalent organism to the correct identity. […]

Pipeline specifications

Software tools cutadapt, BLASTN
Application Metagenomic sequencing analysis