Computational protocol: A dysbiotic mycobiome dominated by Candida albicans is identified within oral squamous-cell carcinomas

Similar protocols

Protocol publication

[…] Raw sequencing data were deposited in (and are publicly available from) Sequence Reads Archive (SRA) under project no. PRJNA375780. Reads with primer mismatches were removed, and primer sequences were trimmed off. Paired sequences were then merged with PEAR [] using the following parameters: minimum amplicon length, 213 bp; maximum amplicon length, 552 bp; and p-value, 0.001. Preprocessing of the merged reads was performed using mothur v1.38.1 []. First, to minimize sequencing errors stringently, reads with ambiguous bases, reads with homopolymers >8 bp, or reads that did not achieve a sliding 50-nucleotide Q-score average of ≥30 were filtered out. Second, the high-quality reads were cleared of chimeras with Uchime [] using the self-reference approach []. Finally, sequences representing non-fungal lineages, identified by preliminary taxonomy using mothur’s classify.seqs command, were removed. [...] The high-quality, non-chimeric merged reads were classified at the species level employing a previously described BLASTN-based algorithm, modified to analyze the fungal ITS2 region instead of the bacterial 16S rRNA gene []. A set of 23,423 fungal ITS sequences representing all named species (16,595 species) in UNITE’s database v7.1 (https://unite.ut.ee/repository.php; 22 August 2016 dynamic release; untrimmed sequences) [] was used as reference (the fasta and taxonomy files of this set can be downloaded at ftp://www.homd.org/publication_data/20170221/). Briefly, the reads were individually BLASTN searched against the reference set at an alignment coverage of ≥99% and a percent identity of ≥98.5%. Hits were ranked by percent identity and, when equal, by bit score. Reads were assigned taxonomies of the best hits. Reads with the best hits representing more than one species were screened again for chimeras using a de novo check at 98% similarity with USEARCH v8.1.1861 and, if not chimeric, were assigned multiple-species taxonomy []. Reads with no matches at the specified criteria underwent secondary de novo chimera checking as above, and then de novo, species-level operational taxonomy unit (OTU) calling at 98% using USEARCH. Singleton OTUs were excluded; the rest were considered potentially novel species, and a representative read from each was BLASTN-searched against the same reference sequence set again to determine the closest species for taxonomy assignment.Downstream analysis was performed, as previously described []. In short, Quantitative Insights Into Microbial Ecology (QIIME™) v1.9.1 [] was employed to perform further analysis, including generation of taxonomy plots, rarefaction, calculation of species richness and diversity indexes, computing distance matrixes, and running principle component analysis (PCoA). Detection of differentially abundant taxa between the cases and controls was done using linear discriminant analysis effect size (LEfSe) []. […]

Pipeline specifications

Software tools PEAR, mothur, UCHIME, BLASTN, USEARCH, QIIME, LEfSe
Databases HOMD
Applications Metagenomic sequencing analysis, 16S rRNA-seq analysis
Diseases Carcinoma, Squamous Cell