Computational protocol: Global population structure and adaptive evolution of aflatoxin‐producing fungi

Similar protocols

Protocol publication

[…] Species in Aspergillus section Flavi were sampled from geographically isolated peanut fields spanning five continents (Tables  and ). We focused our attention on major peanut‐growing regions that were considered hotspots for aflatoxin contamination in 2005. As our goal was to maximize our sampling of genetic diversity in Aspergillus section Flavi, we sampled species diversity at southern latitudes, which is typically higher than in northern latitudes (Horn, ). For this reason, the European continent is not represented in our sampling. Aspergillus oryzae isolates from Japan were included in this global study of section Flavi, although they were not sampled in peanut fields. A total of 1,304 isolates were examined for this study. DNA was extracted from freeze‐dried mycelia, and target loci for phylogenetic inference were amplified and sequenced, as described previously (Moore et al., ). DNA sequence variation was assayed in six genomic regions: aflM/aflN and aflW/aflX intergenic regions within the aflatoxin gene cluster, and a major facilitator superfamily (mfs) gene adjacent to the cluster on chromosome 3; acetamidase (amdS) and mating‐type (MAT) genes on chromosome 6; and the tryptophan synthase (trpC) gene on chromosome 4. The chromosomal locations associated with these loci have been reported for A. flavus (Moore et al., ; Ramirez‐Prado et al., ; Smith, Woloshuk, Robertson, & Payne, ) but have not been confirmed for all species examined in this study. For each locus, the sequence of at least one representative isolate from each putative species was subjected to BLASTn searches against the NCBI nonredundant (nr) database to confirm species identity. Table  lists the numbers of individuals in each species examined per locus. Mating‐type distributions and chemotype profiles for the global populations of A. flavus and A. parasiticus included in this study have been previously examined as well as field data relating to soil type, temperature and precipitation (Moore et al., ). [...] The GenBank accessions for sequences from each locus are listed by species in Tables , , , , , , , . Mutliple DNA sequence alignment was performed using Sequencher® version 5.0 software (Gene Codes Corporation, Ann Arobor, MI, USA), and each alignment was refined by eye. The alignments were then exported in NEXUS format for use in SNAP Workbench (Monacell & Carbone, ; Price & Carbone, ). Multiple sequence alignments for each locus were collapsed into haplotypes using SNAP Map (Aylor, Price, & Carbone, ). Collapsing was performed with the options of recoding indels and excluding infinite sites violations. Indels were recoded as single unique events and weighted equally in phylogenetic analyses. Recoded indels that were not binary violated infinites sites and were excluded. Because our variation spans the population–species interface, collapsing in this fashion allowed us to take full advantage of indels arising within species and exclude only those indels that violate position homology between species. All but one of the alignments were rooted with A. nomius type strain NRRL 13137 as an outgroup species, an appropriate outgroup according to Peterson et al. (). Since NRRL 13137 was found to contain only the MAT1‐1 idiomorph based on genomic sequencing (Moore, Mack, & Beltz, ), a MAT1‐2 A. nomius isolate (IC1493) was used for the respective phylogeny. We performed heuristic parsimony searches in PAUP* 4.0 (Swofford, ) to infer phylogenies for all section Flavi species sampled, and if parsimony searches inferred more than one equally parsimonious tree, bootstrap values were calculated for branch length support. Bootstrap consensus trees were based on 1,000 replicates. Statistical support for specific clades in the tree was based on branches with bootstrap percentages greater than 70 (Hillis & Bull, ). For the amdS and trpC loci, a haplotype comparison was made through separate alignments (data not shown) between our sampled global species and Geiser's Groups I and II (Geiser et al., ). Briefly, Geiser and coworkers examined 11 genomic loci for an Australian A. flavus population and reported the existence of two reproductively isolated clades or groups that could not be delineated by geography or morphology.Additional single‐locus studies, for the sampled global species in section Flavi, involved inference of maximum‐likelihood (ML) phylogenies using the program Randomized Axelerated Maximum Likelihood or RAxML (Stamatakis, ), and principal component analysis (PCA). These analyses were performed with the exclusion of indels and infinite sites violations from a DNA sequence alignment. PCA uses a variance–covariance matrix to reduce a sample into groupings of uncorrelated variables called principal components (Abdi & Williams, ). The components are known as eigenvectors and refer to different levels of variation. The first eigenvector considers the most variation possible with subsequent eigenvectors having less variation. Principal components were normalized to sum to 1, and the number of significant axes of variation (eigenvectors) was determined using the Tracy–Widom statistic (Tracy & Widom, ). The number of k clusters was estimated using the Gap Statistic (Tibshirani, Walther, & Hastie, ), which is an unbiased estimate of the number of distinct clusters in the population sample. Significant principal components and PCA clusters were displayed graphically using the SCATTERPLOT3D package in R (Ligges & Mächler, ). In a previous study, we showed evidence of balancing selecting in the aflW/aflX cluster region, which delimited two evolutionary distinct lineages: IB and IC. Lineage IB included only nonaflatoxigenic isolates while IC included both toxigenic and atoxigenic strains (Moore et al., ). To explore this partitioning of variation into lineages IB and IC on a global scale, and across multiple species in section Flavi, single‐locus phylogenies were inferred for the aflW/aflX region, as well as noncluster loci (amdS, trpC and MAT).Multilocus analyses were performed for the A. flavus and A. parasiticus populations. The A. flavus isolates included both large sclerotium (L) and small sclerotium (S) morphotypes (Cotty, ). The six genomic loci were first concatenated using SNAP Combine (Aylor et al., ), and collapsed with SNAP Map having recoded indels and excluded infinite sites violations. Each geographic location was examined separately with focus on A. flavus and A. parasiticus isolates and their associated chemotype profiles. Phylogenies were displayed in circle tree format using FigTree v1.3.1 ( or T‐BAS v2.0 (Carbone, unpublished), an extension of the T‐BAS v1.0 toolkit (Carbone et al., ). Clade groupings were labeled along the tree perimeter based on species or morphotype (e.g., A. flavus L, A. flavus S, A. parasiticus). The PCA scatter plots considered up to three eigenvectors (axes of variation). Additional PCA scatter plots were inferred for the global combined multilocus dataset to investigate possible associations of temperature, precipitation, or soil type with inferred clusters.The optimal number of k clusters was also determined using STRUCTURE version 2.3.1 (Falush, Stephens, & Pritchard, ; Pritchard, Stephens, & Donnelly, ), which assigns individuals to clusters assuming a model of admixture and correlated allele frequencies. To achieve good mixing and convergence, we used an MCMC sampling strategy of 20,000 iterations after a burn‐in period of 20,000. Three simulations were performed for k ranging from 1 to 10 to assess convergence of log‐likelihood values using the “full search” method in CLUMPP v1.1.2 (Jakobsson & Rosenberg, ). Two methods, LnP(D) and delta K (Evanno, Regnault, & Goudet, ) implemented in Structure Harvester v0.6.93 (Dent & vonHoldt, ), were used to estimate the optimal number of k clusters. The results were visualized in T‐BAS v2.0 as outer rings of a multilocus global ML phylogeny. This holistic analysis combined results from PCA, STRUCTURE, and phylogenetic inference. The ML analysis was performed using RAxML version 8, which is accessible through the CIPRES RESTful application (CRA) programmer interface (Miller et al., ). Specifically, the multilocus sequence alignment file was subjected to a ML search strategy under the GTRGAMMA model of rate heterogeneity, and branch support values were based on rapid bootstrapping using 500 replicates; A. nomius (NRRL 13137) was specified as the outgroup. The best ML tree and associated locality and chemotype metadata were examined using T‐BAS v2.0 to look for evidence of population structuring based on geography and among closely related species (A. nomius, A. oryzae and A. sojae). We performed statistical analyses to determine how well individual cluster membership can predict chemotype using multiple linear regressions in R (R Development Core Team ) on the matrix of individual membership coefficients and different toxin chemotype concentrations, normalized between 0 and 1. Population structure was investigated by hierarchical analysis of molecular variance (AMOVA) and quantifying total variance within and between localities by F ST using Arlequin v. (Excoffier & Lischer, ). Significance of F ST estimates was determined using 1,000 permutations. […]

Pipeline specifications

Software tools BLASTN, Sequencher, RAxML, FigTree, T-BAS, CLUMPP, Structure Harvester, Arlequin
Applications Phylogenetics, Population genetic analysis
Organisms Homo sapiens
Chemicals Aflatoxins