Computational protocol: Exploring the Phenotypic Space and the Evolutionary History of a Natural Mutation in Drosophila melanogaster

Similar protocols

Protocol publication

[…] Single nucleotide polymorphism (SNP) data were downloaded from the DGRP2 webpage ( in vcf format. Strains with (N = 65) and without (N = 38) FBti0019386 insertion were filtered using vcftools v_0.1.10 ( used three different statistics to detect positive selection: Nucleotide diversity (π), Tajima’s D, and the CL of SNPs. Positive selection results in the elimination of standing genetic variation that is linked to the adaptive mutation. Thus, if FBti0019386 has increased in frequency due to positive selection, we expect a decrease in π in flies with the insertion compared with flies without the insertion. π is calculated as the mean number of pairwise differences between two given sequences (). Tajima’s D statistic is calculated as the ratio between the mean number of pairwise differences and the number of segregating sites (). This ratio is expected to be 0 in a neutrally evolving population whereas negative values of Tajima’s D can be taken as evidence of positive selection (). Finally, CL test is calculated by multiplying the marginal likelihoods for each site along the studied sequences ().π, Tajima’s D, and CL were calculated for the two sets of sequences, with and without the insertion, using the PopGenome package in R (). Sliding windows analyses were performed for 200-bp-size windows spanning 1 and 2-kb regions flanking the insertion. Differences between strains with and without the insertion were more drastic for the 1-kb region flanking the insertion; therefore, we focused our analysis in this region.Simulations were performed using the MS program (). Theta values were estimated using the 205 DGRP2 strains for the 2-kb region around FBti0019386 (theta = 4.77/kb) and for the 3 R chromosomal arm (theta = 4.5/kb). Thus, simulations were performed for theta values of 4/kb and 5/kb, which are frequently used as neutral values in D. melanogaster.Ad hoc perl scripts were used for the resampling analyses. In total, 1,000 random samples of the 103 DGRP strains analyzed were obtained keeping the same proportion as in the original present and absent data sets (60%/40%, respectively) and a sample size of nearly 50% of the total data set.We also computed CLR as 2*(log CL (present) − log CL (absent)), for a 1-kb region around the TE insertion. Because demography could produce similar patterns as positive selection, we performed a random sampling of 1,000 1-kb-long regions from the 3 R chromosome for the absent and present data sets and calculated π, Tajima’s D, CL, and CLR tests in each one of them. [...] To obtain FBti0019386 frequency, we run T-lex2 () using Drosophila whole-genome sequences available from a total of 23 populations from North America, Australia, Europe, and Africa (supplementary table S5, Supplementary Material online).The accuracy of TE frequency estimates using T-lex2 is affected by coverage. However, coverage for all samples was higher than 20× except for Lyon (France) and California (USA), which had 8× and 4.7× coverage respectively, suggesting that overall frequency estimates are accurate. [...] We analyzed whether the frequency of FBti0019386 insertion correlated with different geographical and climate variables in North America, Australia, and Europe using Pearson product–moment correlations. We also performed a PCA to disentangle the relationships between the climatic variables using Statistica (v8.0, StatSoft, Inc. 2007). Climatic data were obtained from the weather stations adjacent to collection sites of each population, available in . When necessary, data were transformed as described in (see pages 411–422). [...] We used small RNA sequencing data to check whether piRNAs reads mapped to FBti0019386 sequence, following a methodology similar to that described in . Briefly, we obtained the small RNA reads from Oregon R ovaries (accession number SRP000458) (), and from wild type ovaries (accession number: SRX470700) (). We aligned the reads by using BWA-MEM package version 0.7.5 a-r405 () to the 14.6-kb sequence obtained from Drosophila reference genome, containing Bin1 and sra genes, and FBti0019386 (release five chromosomal coordinates 3 R: 12,010,721–12,025,306). Then, we used samtools and bamtools (Barnett et al. 2011) to index and filter by sense/antisense reads. Finally, we obtained the total read density using R (Rstudio v0.98.507). […]

Pipeline specifications

Software tools Statistica, BWA, SAMtools, BamTools
Applications Miscellaneous, Non-coding RNA analysis
Organisms Drosophila melanogaster