Computational protocol: Population and Evolutionary Genomics of Amblyomma americanum, an Expanding Arthropod Disease Vector

Similar protocols

Protocol publication

[…] To evaluate the genetic structuring of individuals and populations, three complementary approaches were used: 1) principal components analysis (PCA), a model-free multivariate ordination method implemented in the adegenet package () in R (); 2) a maximum-likelihood model-based estimation of ancestry implemented in ADMIXTURE (); and 3) analysis of molecular variance (AMOVA) implemented in GenAlEx (; ). To evaluate the genomic diversity within populations, heterozygosity (H) and the inbreeding coefficient (F) were computed using VCFtools ().For the PCA, centering and binomial scaling were used to compensate for differences in variance among allele frequencies (). For the ADMIXTURE analysis, which partitions N samples into K genetic clusters, ten runs were conducted at each value of K ranging from K = 2 through K = 5, after which point additional clusters were no longer informative. Running each value of K 10 times produced a total of 40 Q matrices; this allowed the detection of potential multimodality in the data, the situation when there is more than one way to assign individuals to genetic clusters. Each run was started at a randomly generated seed to explore the full breadth of variation space and determine the major mode present in the data. Multimodality was visualized using pong (). The optimal value of K was chosen based on 5-fold cross-validation procedures implemented during individual ADMIXTURE runs. The optimal value of K is the point at which the addition of more clusters no longer provides a better fit to the data, as judged by minimizing the cross-validation error across runs. Other biologically relevant values of K were also considered, as is recommended by . For the AMOVA, population genomic tests for genetic differentiation were conducted between three geographically defined regions: Northeast (NY only), Southwest (OK), and Southeast (NC + SC). ME ticks were excluded from pairwise analyses as they have a low sample size (N = 5) and do not appear to define their own genetic cluster (see Results).To identify candidate loci that may have been subject to selection during the range expansion of A. americanum, the LOSITAN program (), which employs the FDIST2 algorithm (), was used to detect SNPs that are FST outliers. This method evaluates the relationship between the expected distribution of FST and heterozygosity assuming an island model of migration. Due to computational constraints of the program, a random subset of 5,000 SNPs was selected and two genetic populations were considered: NY, representing a newly established population, and NC + SC, representing a historic population. OK and ME ticks were excluded because genetic substructure was detected within these samples (see Results). LOSITAN was run for 50,000 simulations, assuming an infinite alleles mutation model, two expected populations, and a conservative false discovery rate of 0.1. […]

Pipeline specifications

Software tools adegenet, ADMIXTURE, GenAlEx, VCFtools, pong
Application Population genetic analysis
Organisms Amblyomma americanum, Homo sapiens
Chemicals Nucleotides