Computational protocol: Cervical microbiome is altered in cervical intraepithelial neoplasia after loop electrosurgical excision procedure in china

[…] Sequencing reads were demultiplexed and filtered by Trimmomatic and merged by FLASH with the following criteria: (1) The reads were truncated at any site receiving an average quality score <20 over a 50 bp sliding window. (2) Primers were exactly matched allowing 2 nucleotides mismatching, and reads containing ambiguous bases were removed. (3) Sequences whose overlap longer than 10 bp were merged according to their overlap sequence. Paired-end reads were overlapped using PANDAseq with a required overlap length of >300 bp. Reads less than 100 nucleotides or lacking a correct primer were removed. The 16S rRNA sequencing data were processed using the Quantitative Insights Into Microbial Ecology platform (QIIME, V.1.9.1). Operational taxonomic units (OTUs) were clustered with 97% similarity cutoff using UPARSE (version 7.1 and chimeric sequences were identified and removed using UCHIME. The taxonomy of each 16S rRNA gene sequence was analyzed by RDP Classifier algorithm ( against the Silva (SSU123) 16S rRNA database using confidence threshold of 70%. A total of 8,600,000 sequences clustered to 652 OTUs were obtained after quality filtering. Alpha diversity was analyzed with mother. Richness of each sample was calculated with the Sobs index and diversity accounting for both relative abundance and evenness was evaluated with Invsimpson and Shannon index. The Principal Component Analysis (PCA) was performed by the R package ade4. Each coordinate on the score plot represents an individual sample. Principal Coordinates Analysis (PCoA) based on Bray-Curtis dissimilarities were performed. Permutational Multivariate Analysis of Variance Using Distance Matrices (PERMANOVA) and Analysis of Similarities (ANOSIM) were carried out using the ‘adonis’ and ‘anosim’ functions in the ‘vegan’ package, respectively, with Bray-Curtis dissimilarities and 999 permutations. [...] Examination of statistical differences between cervical microbiota was performed at bacterial genera and species levels using the Statistical Analysis of Metagenomic Profiles software package. Ward’s linkage hierarchical clustering analysis (HCA) of bacterial genera was performed using a clustering density threshold of 0.75. Bacterial species data were classified into CTs as described by Anahtar et al.: CT I (non-iners lactobacillus; high percentage of Lactobacillus crispatus), CT II (L. iners), CT III (Gardnerella), and CT IV (mixed bacterial species containing Prevotella). The effects of LEEP on bacterial genera, number of species observed, and α diversity were assessed using one-way ANOVA, Kruskal-Wallis test, and Dunn’s multiple comparison test, where appropriate. The LEfSe method characterized differentially abundant taxonomic features before and 3 months after LEEP. An α value of 0.05 was used for factorial Kruskal-Wallis test between classes, and a threshold of 2.0 was used for logarithmic LDA score for discriminative features. Fisher’s exact test was used to comparing categorical data among two or more groups. P values are two-sided. The analyses were performed with R packages (V.2.15.3) and Prism (GraphPad). […]

Pipeline specifications

Software tools Trimmomatic, PANDAseq, QIIME, UPARSE, UCHIME, RDP Classifier, STAMP, LEfSe
Applications Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Human papillomavirus, Homo sapiens, Lactobacillus iners
Diseases Cervical Intraepithelial Neoplasia, Papillomavirus Infections