Process 16S rRNA sequences with the sl1p tool

Advancing DNA sequencing technologies have encouraged a surge of microbiome studies. The microbiome, the set of microbes (bacteria, viruses, archaea) who live in a particular environmental niche, has been extensively studied, including in the context of human disease, changes in ecological environments, and progressive oxygen gradients in the deep sea. One of the most popular methods for these types of studies is the sequencing of segments of the 16S rRNA gene– a highly conserved gene among bacterial populations which allows researchers to identify the taxonomic diversity within a given bacterial niche.

Drs. Whelan and Surette have recently come up with a new tool, sl1p, that helps automate the processing of 16S rRNA gene sequencing data and provides analyses which allow the user to jump right into answering their own microbiome-related research questions without extensive bioinformatics training. Here, they describe the main features and benefits of their tool.

The need for a better tool

Many tools and pipelines exist for the processing of microbial marker gene data. Many of these, such as the popular QIIME and mothur, process data using different approaches and algorithms, or provide the user with a choice of approaches for these various steps. Further, these tools often consist of a set of command line steps which are both time consuming and prone to irreproducibility. To address these issues, we developed the short-read library 16S rRNA gene sequencing pipeline (sl1p; pronounced “slip”), a stand-alone pipeline which automates these steps into an easy-to-use, reproducible approach.

sl1p processes 16S rRNA gene sequencing data with the most biologically accurate tools

In order to process 16S rRNA gene sequencing data, a variety of processing steps must be implemented. These include but are not limited to quality filtering, checking for chimeras, picking operational taxonomic units (OTUs), and assigning taxonomy to OTUs (Fig.1). sl1p implements a wide variety of algorithms and options for each of these processing steps. Importantly, the defaults of sl1p were carefully chosen to represent the tools and approaches which worked best in a comprehensive comparison using mock human microbiome sequencing datasets and cultured isolates. Detailed information about these comparions can be found in Whelan FJ & Surette MG (2017) Microbiome.

Figure 1. Processing steps implemented in Sl1p

sl1p conducts preliminary analyses of microbial community data

Included in sl1p’s output are preliminary analyses that the user can use to quickly obtain a broad understanding of their data immediately after sl1p has been run. The preliminary analyses produced by sl1p include a summary of the amount of non-bacterial reads in each sample, taxonomic summaries of each sample at various taxonomic levels (phyla, class, order, family, and genus), as well as alpha- and beta-diversity outputs using 3 different distance metrics (Fig.2). Importantly, these outputs are produced using both QIIME and R and the raw commands for both are included for the user to use as they further interrogate their data to answer questions specific to their research, making these analyses more approachable to the non-bioinformatician.

Figure 2. Preliminary analyses provided in Sl1p

sl1p promotes reproducibility

The main goal of sl1p was to make reproducible and accurate microbiome research more accessible. sl1p produces a comprehensive logfile (Fig.3) which outlines exactly how sl1p was called, important version information of each of the software dependencies, and how each processing step was conducted. This logfile is a valuable tool in order to be able to reproduce a given sl1p run or to understand how small changes in the processing workflow can alter the resulting data output. Further, sl1p provides an R markdown file detailing each step taken in sl1p’s preliminary analyses of the data. Not only is this file an appropriate place for the user to start their own analyses, but it provides transparency in how the sl1p outputs are generated.

Figure 3. Sl1p logfile produced after analysis


Whelan FJ & Surette MG. (2017). A comprehensive evaluation of the sl1p pipeline for 16S rRNA gene sequencing analysis. Microbiome.