Computational protocol: Evolution of Three Parent Genes and Their Retrogene Copies in Drosophila Species

Similar protocols

Protocol publication

[…] Coding and amino acid sequences of all transcripts for CG8331, CG4960, CG17734, CG11825, Sep2, and Sep5 in D. melanogaster were obtained from FlyBase release FB2012_02 [, ]. To identify the parent gene transcript that likely gave rise to the retrogene, all pairwise alignments of coding sequences derived from alternative transcripts in D. melanogaster were generated using the Needleman-Wunsch algorithm []. The coding sequences of these homologous transcripts were used for further sequence analyses. Amino acid and coding sequences of the orthologs in other sequenced Drosophila species were obtained from FlyBase [] using the coding sequences of the D. melanogaster genes as BLAST [] queries. If a BLAST search resulted in a predicted gene model, then coding and protein sequences were obtained from that gene model; if no gene model existed for a particular search result, then coding and protein sequences were predicted using GeneWise []. If no ortholog was found in a particular species, we determined whether its absence was due to a deletion or an absence of genomic sequence data by performing BLAST searches, using the D. melanogaster genomic sequences flanking these genes as queries. In some cases, no BLAST hit corresponded to >100 kilo bases of sequence containing the gene in D. melanogaster, suggesting that the gene may be absent due to a gap in that species genome assembly. We chose to include sequences from all available sequenced Drosophila species to increase the power of our analyses for detecting selection acting on gene pairs. Accession numbers of the genes and genomes used are listed in Supplementary File 1 in Supplementary Material available online at alignments of each gene pair were constructed by aligning protein sequences using Clustal Omega [] with default settings, reverse-translating the protein alignment into a codon alignment with PAL2NAL [], and then checking the alignments and removing codons that contained gaps or that were ambiguously aligned in some species []. For each codon alignment, MEGA5 [] was used to determine the best model of sequence evolution and then construct a phylogenetic tree using maximum likelihood. Trees were visualized using iTOL [, ]. […]

Pipeline specifications

Software tools GeneWise, Clustal Omega, PAL2NAL, MEGA, iTOL
Databases FlyBase
Applications Phylogenetics, Nucleotide sequence alignment
Organisms Drosophila melanogaster