Computational protocol: Comparison of Storage Conditions for Human Vaginal Microbiome Studies

Protocol publication

[…] Universal primers 27F and 338R were used for PCR amplification of the V1-V2 hypervariable regions of 16S rRNA genes . The 338R primer included a unique sequence tag to barcode each sample. The primers used were 27F-5′-GCCTTGCCAGCCCGCTCAGTCAGAGTTTGATCCTGGCTCAG-3′ and 338R-5′-GCCTCCCTCGCGCCATCAGNNNNNNNNCATGCTGCCTCCCGTAGGAGT-3′), where the underlined sequences are the 454 Life Sciences FLX sequencing primers B and A in 27F and 338R, respectively, and the bold letters denotes the universal 16S rRNA primers 27F and 338R. Pyrosequencing of barcoded 16S rRNA gene amplicons was performed using the same method described by Ravel et al.. The QIIME software package was used for quality control of the sequence reads using the script and the following criteria: 1) no ambiguity base 2) minimum and maximum length of 220 bp and 400 bp; 3) an average of q25 over a sliding window of 50 bp. If the read quality dropped below q25 it was trimmed at the first base pair of the window and then reassessed for length criteria; 5) a perfect match to a barcode sequence; and 6) presence of the 338R 16S primer sequence used for amplification. Sequences were binned based on sample-specific barcode sequences and trimmed by removal of the barcode and primer sequences (forward, if present, and reverse). High quality sequence reads were first de-replicated using 99% similarity using the UCLUST software package and detection of potential chimeric sequences was performed using the UCHIME component of UCLUST . Chimeric sequences were removed prior to taxonomic assignments.Genus level taxonomic assignments were performed by using the RDP Classifier , and further species level assignments for Lactobacillus sp. were done using 127 HMM Lactobacillus species models followed by clustering analysis using the software speciateIT ( [...] For each sample, vectors of phylotype proportions were clustered into community state types as previously reported by Ravel et al. . We sought to test the hypothesis that there were no significant differences between a woman's samples stored under the three cold chain conditions. The vaginal bacterial compositions obtained from two different storage conditions within a women were compared by computing the Jensen-Shannon divergence between the two vectors of phylotype proportions representing the community compositions , which is a measure of the distance or dissimilarity between these two communities. In order to estimate the extent to which community states of samples stored at two different storage conditions differ from each other, we compared the distances between these community states to the distances between community states of samples stored at the same storage condition. The Kolmogorov-Smirnov test was used to show that at the 0.05 significance level, the null hypothesis that these two samples of distances came from the same distribution cannot be rejected. The null distribution of distances between community states of samples stored at the same storage condition was estimated utilizing data from prior work in which sixteen vaginal sample triplets were collected, each triplet was collected from the same women by a physician . In order to demonstrate that the results were not sensitive to the choice of a dissimilarity measure between community states, the Kolmogorov-Smirnov tests were applied to the following measures of dissimilarity: relative entropy, Euclidean distance, Euclidean distance applied to log transformed relative abundances, Bray-Curtis metric.Previous studies have suggested that comparison of communities should be made using equal number of sequence reads in order to minimize the sequencing artifact as the number of spurious phylotypes increases with sequencing effort . We have randomly picked 4000 sequencing reads from each sample using a pseudo-random generator , for a secondary comparison of community composition and structure among samples. 16S rRNA amplicon sequences were analyzed using the QIIME (v 1.4.0) suite of software tools . Similar sequences with less than 1% dissimilarity were clustered together and detected chimeras were removed using the usearch method . The processed sequences were then aligned using Python Nearest Alignment Space Termination (PyNAST) , and gaps and parsimonious phylogenetically uninformative characters in the alignment were removed. The taxonomic ranks were assigned to each sequence using RDP classifier v.2.2 with 0.8 confidence values as the cutoff to a pre-built greengenes database of assigned sequences (February 4, 2011 version) . The de novo phylogeny was built based on filtered alignment using RAxML method , and the phylogeny-based weighted UniFrac distance metrics were calculated to assess the difference in overall microbial community composition. To provide visualization of the sample distribution patterns, a principal coordinates analysis (PCoA) was then used to transform the UniFrac distance matrices into principal coordinates. […]

Pipeline specifications

Software tools QIIME, UCLUST, UCHIME, RDP Classifier, USEARCH, PyNAST, RAxML, UniFrac
Applications Phylogenetics, 16S rRNA-seq analysis
Organisms Homo sapiens, Lactobacillus gasseri
Diseases Vaginosis, Bacterial
Chemicals Lactic Acid