Computational protocol: Oscillating Evolution of a Mammalian Locus with Overlapping Reading Frames: An XLαs/ALEX Relay

Similar protocols

Protocol publication

[…] Reliable alignment was generated by first translating nucleotide sequences from each taxa, aligning the translations using ClustalW [], refining these alignments manually, and then reconstructing nucleotide alignments, using the protein alignment as a guide. Phylogenetic tree and most statistics were calculated using the PAML software package []. All analyses were performed on the region of overlap between the two reading frames, excluding the repetitive region. Synonymous and nonsynonymous rates were apportioned among the branches of the tree using the codeml program of the PAML package under the free ratio model [].The neighbor-dependent modification of the NG method was written in PERL programming language and is available from the authors upon request. The only difference from the classical NG algorithm [] is that pathways creating stop codons in the alternative reading frame are ignored by our method. For example, let us consider the alignment in .The alignment contains two reading frames: frame 0 starting at position 0 and frame 1 starting at position 1. The second codon of frame 0 contains two substitutions, and so there are two possible parsimonious pathways:Pathway 2 would convert the second codon of frame 1 into a stop (TAG), and so it is not considered by our method.To test whether the GC content of the XL-exon is required for the coexistence of the two reading frames, we first estimated codon frequencies in (1) human RefSeq genes, (2) XLαs reading frame, and (3) ALEX reading frame. This procedure was performed using a custom-designed PERL script. Coding regions of human RefSeq genes were downloaded from the National Center for Biotechnology Information ftp site (ftp://ftp.ncbi.nlm.nih.gov). We then used the evolver program of the PAML package to simulate 1,000,000 sequence sets, using the three codon frequency tables. Each set contained eight sequences corresponding to primate species used in this study. All other parameters accepted by evolver (phylogenetic tree, branch lengths, transition/transversion ratio, codon number, and the K A/K S ratio) were taken from codeml output generated during nucleotide substitution analysis of our data and were fixed in all three simulations. Each set of simulated sequences was then inspected for the presence of +1 and −1 overlapping reading frames. A set of simulated sequences was considered to have an overlapping reading frames if such frame was greater than or equal to 1,000 bp and was conserved in all eight sequences within the set.Analysis of substitutions at CpG sites was carried out using a collection of PERL script, which can be obtained upon request. […]

Pipeline specifications

Software tools Clustal W, PAML
Application Phylogenetics
Organisms Mus musculus, Homo sapiens
Chemicals Amino Acids