Computational protocol: The Influence of Age and Gender on Skin-Associated Microbial Communities in Urban and Rural Human Populations

Similar protocols

Protocol publication

[…] Sequences were processed using the QIIME ( software package []. Reads were assigned to particular libraries according to the 8-nucletide (nt) barcodes with the criteria of higher than 25 quality value, >250 nt in length, no ambiguous characters and no homopolymers run exceeding 8 nt. The complete data set was chimera-checked using USEARCH61 ( with the Greengenes database []. Then the remaining reads were clustered into operational taxonomic units (OTUs) by UCLUST [] based on 97% identity. After singletons removal, a representative sequence was chosen from each OTU by selecting the first sequence (the UCLUST cluster seed). Taxonomy was assigned to each representative sequence using the Ribosomal Database Project (RDP) classifier [], with a minimum confidence of 80%. Representative sequences were aligned against the Greengenes database using Python Nearest Alignment Space Termination tool (PyNAST) [], and used a minimum alignment length of 210 and a minimum identity of 75%. The OTUs which failed to align to representative sequences were dropped. The PH Lane mask was used to remove hypervariable regions after alignment. The aligned representative sequences were assigned a phylogenetic relationship using FastTree []. To ensure adequate representation of the community structure, samples with <200 reads were removed. To evaluate the amount of diversity contained within communities (alpha diversity), rarefaction analysis was performed with Chao1, Shannon and phylogenetic distance (PD) index []. To determine the amount of diversity shared between two communities (beta diversity), UniFrac distances [] were calculated between all pairs of samples. UniFrac distances were based on the fraction of branch length shared between two communities in a phylogenetic tree. Unweighted UniFrac accounts for membership only (community membership, not considering the content of each member), whereas weighted UniFrac accounts for membership and relative abundance (community structure, considering members and the content of each member together). UniFrac-based jackknifed hierarchical clustering was performed using unweighted pair group method with arithmetic mean (UPGMA) in QIIME. Principal coordinates analysis (PCoA) was also performed on the UniFrac distance matrices, and visualized using the KiNG graphics program ( We subsampled 1,364 samples to 200 sequences per sample, and then collapsed rarefied samples into 84 groups according to factors of age, gender, residence, and skin site. Again, we rarefied the 84 groups to 1,400 sequences per group. Finally, these rarefied groups were used to perform PCoA and UPGMA analysis; the relative abundances of these groups were all examined using heat maps. Except for these analyses, all other investigations used all 1,364 of the rarefied samples. The sequence data generated for this study were deposited in the NCBI GenBank Short Read Archive (SRA) under accession number SRP051059. […]

Pipeline specifications

Software tools QIIME, USEARCH, UCLUST, RDP Classifier, PyNAST, FastTree, UniFrac
Databases Greengenes
Applications Phylogenetics, 16S rRNA-seq analysis
Organisms Homo sapiens