Computational protocol: Culex quinquefasciatus larval microbiomes vary with instar and exposure to common wastewater contaminants

Similar protocols

Protocol publication

[…] Raw bacterial DNA sequences were analysed using a MacQIIME (version: 1.8.0-20140103) based pipeline. Raw .fasta and .qual files were combined to .fastaq file by the convert_fastaqual_fastaq.py command and uploaded to NIH/NLM/NCBI Sequence Read Archive (SRA) under SRP study accession number: SRP067136. Barcodes and primers were trimmed using the split_libraries.py script with default settings. After barcode trimming, data was denoised and re-inflated using the default denoise_wrapper.py script settings and the default setting in the inflate_denoiser_output.py script. Following denoising, no samples contained any ambiguous nucleotides. The maximum, minimum, and mean number of sequences across all samples was 22128, 3102, and 10099.11 respectively, with the average length of the reads being 403.84 base pairs. Operational taxonomic units (OTUs) were chosen by the default 97% identity threshold, which roughly correlates to species, via the UCLUST method as implemented in the pick_otus.py script. Representative OTUs were chosen using the pick_rep_set.py script and default settings. The Greengenes reference database clustered at 97% identity was used to assign taxonomy using the assign_taxonomy.py script. OTUs were counted and summarized using the make_otu_table.py and summarize_taxa.py scripts respectively. OTUs were aligned using the align_seqs.py and filter_alignment.py scripts, and used to build a phylogenetic tree (make_phylogeny.py). There were 658 distinct OTUs at the species level with 58 distinct families; 15 OTUs failed to match any contained within the database and could not be assigned taxonomically. Fifteen families were chosen by their proportionality being greater than or equal to 1% in at least one sample for the heatmap. The cut-off was chosen at 1% as this was assumed to be the minimum to influence larval development at that stage. For alpha diversity, multiple rarefactions were performed using the multiple_rarefaction.py script with the lowest rarefaction depth of 2000, the highest rarefaction depth of 21000, a step size of 1000, and a replicate number of ten, which normalizes the data at each depth. Alpha diversity was calculated using the alpha_diversity.py script with the metrics observed species (species richness) and Shannon Indices (evenness) from the raw data. Alpha diversity data was not averaged between replicate mosquitoes as they have been averaged by resampling-replicates and the complications and validity of this is still being considered. Metrics were summarized using the collate_alpha.py script. [...] Statistical analyses were performed using R (the R Foundation for Statistical Computing, version 3.1.1). Following processing through the QIIME pipeline, “Permutational MANOVA” (PERMANOVA) in the Vegan package was used to compare the OTU data (). Independent variables were instar (n = 3), PPCP treatment (n = 4) and the interaction of the two, with three replicates (n = 3) of each instar in each PPCP treatment and control (n = 36). PERMANOVA is analogous to MANOVA but is suited to address the non-normality that is commonly associated with count data in ecological community and genetic data. Microbial community data were further examined via principal component analysis (PCA) performed in the FactoMineR package. Ellipses in the PCA encompass the three mosquito replicates in each instar for that treatment. PCA and PERMANOVA were conducted on each instar in the individual PPCP and control treatment groups. Following PCA, variables were examined for their contributions and correlation to each of the first two dimensions. Those variables (OTUs) that were ≥85% correlated were included in subsequent pairwise comparisons by instar in their respective treatment. Generalized linear hypotheses was used to perform pairwise comparisons in the multcomp package. P values were adjusted using the p.adjust command. Alpha diversity data was analysed using a negative binomial generalized linear models at a sequence depth of 3000 sequences/sample to normalize data to the highest number where all sample mosquitoes were present. The alpha level for all tests was 0.05. […]

Pipeline specifications

Software tools Fastaq, QIIME, UCLUST, vegan, FactoMineR, multcomp
Databases Greengenes
Applications Miscellaneous, Phylogenetics, GWAS
Organisms Culex quinquefasciatus, Homo sapiens
Diseases Pulmonary Fibrosis
Chemicals Acetaminophen, Caffeine