Computational protocol: Oral Microbiota Community Dynamics Associated With Oral Squamous Cell Carcinoma Staging

Similar protocols

Protocol publication

[…] The sequencing reads were initially demultiplexed using MiSeq Reporter v2.6 according to the sample barcodes. The resulting pairs of reads from each sample were merged to obtain longer reads (460 ± 50 bp) to improve taxonomy classification using FLASH (V1.2.11) (Magoc and Salzberg, ). Only samples with merged reads ≥100,000 were retained for subsequent analysis. Low-quality reads with q-value < 20 were filtered by the script of QIIME (Version 1.9.1) (Caporaso et al., ). The package Cutadapt v1.14 was used to remove forward and reverse sequencing primers from the merged reads of each dataset. The resulting sequence tags were compared to the Gold database ( to remove chimera sequences using the USEARCH package ( (Edgar, ). Only sequence tags with length >400 bp were retained for subsequent analysis. The operational taxonomic units (OTUs) were clustered at 97% sequence similarity using USEARCH and then assessed using BLASTN 2.6.0+ against four sets of 16S rRNA reference sequences, including HOMD RefSeq V14.5, HOMD RefSeq Extended V1.1, modified GreenGeneGold, and the NCBI's Microbial 16S set (Al-Hebshi et al., , ). Taxonomy classification was also assigned according to the Greengenes database. Clustal Omega software was used to construct phylogenetic trees from the representative sequences of the OTUs. The α-diversity (e.g., observed OTU numbers, Chao index, Simpson index, and Shannon index) and β-diversity (Bray Curtis dissimilarity) measurements were calculated based on the normalized data of the cumulative sum scaling (CSS) transformation of metagenomeSeq (Paulson et al., ). Canonical correspondence analysis (CCA) and constrained principal coordinate analysis (CPCoA) were used to visualize the data. The differential abundances of OTUs between healthy controls and OSCC were determined using Linear discriminant analysis Effect Size (LEfSe) (Segata et al., ). A heatmap was obtained using R scripts with the clustering distance of “correlation” and clustering method of the “ward.D2” settings. The metagenome content was predicted using PICRUSt (Langille et al., ), and the KEGG pathway was generated. […]

Pipeline specifications

Software tools QIIME, cutadapt, UCHIME, USEARCH, UPARSE, BLASTN, Clustal Omega, metagenomeSeq, LEfSe, PICRUSt
Databases KEGG PATHWAY Greengenes HOMD
Applications Phylogenetics, Metagenomic sequencing analysis, 16S rRNA-seq analysis
Organisms Homo sapiens, Escherichia coli, Bacteroidetes, Streptococcus constellatus, Filifactor alocis, Streptococcus mitis, Bacteria
Chemicals Amino Acids, Carbon, Folic Acid, Methane