Computational protocol: Major role of positive selection in the evolution of conservative segments of Drosophila proteins

[…] Complete genotypes of 162 inbred lines of D. melanogaster [] were obtained from the Drosophila Genetic Reference Panel website ( Multiple alignments of genome assemblies of 11 Drosophila species [] to D. melanogaster (dm3, BDGP release 5) were obtained from UCSC Genome Bioinformatics Site ( The set of FlyBase canonical splice variants was used to map 13 300 D. melanogaster protein-coding genes onto the alignment. Multiple alignment of each coding region was then obtained by joining the aligned segments corresponding to the exons of the FlyBase canonical genes in D. melanogaster. The resulting alignments are available at genotypes of six strains of D. simulans [] were obtained from the Drosophila Population Genomics Project website ( Since D. simulans genotypes were assembled against an earlier D. melanogaster reference sequence assembly (dm2, BDGP release 4 []), we used the corresponding multiple alignment of 11 Drosophila species based on the dm2 D. melanogaster reference sequence from UCSC in all analyses involving variation within D. simulans. The alignments were processed in the same way as the dm3-based alignments. A total of 13 479 D. melanogaster protein-coding genes were mapped onto the alignment.Only those codon sites in which the reference sequence of each of the 12 species carried a valid codon were considered. Valid codons were defined as those which were aligned and did not contain gaps or non-ACGT characters. We also excluded interspersed repeats and low complexity sequences masked by RepeatMasker [] and Tandem Repeats Finder [] with settings as detailed in the UCSC Genome Bioinformatics Site ( […]

Pipeline specifications

Software tools RepeatMasker, TRF
Application Population genetic analysis
Organisms Drosophila melanogaster, Drosophila simulans