Computational protocol: Selection on the Major Color Gene Melanocortin 1 Receptor Shaped the Evolution of the Melanocortin System Genes

Similar protocols

Protocol publication

[…] Our study comprises a total of 138 species representing the main vertebrate lineages (16 birds, 2 snakes, 7 lizards, 1 turtle, 1 monotreme, 3 marsupials, 81 placental mammals, 3 amphibians, 1 coelacanth, 20 teleost fish, 2 sharks, and 1 lamprey species: ). Complete coding sequences of the five melanocortin receptors (MC1-5Rs), the agonist ligand (POMC), antagonists (ASIP and AGRP), and convertases (PCSK1 and PCSK2) were downloaded from Ensembl (http://www.ensembl.org) and NCBI (http://www.ncbi.nlm.nih.gov) databases on 02.04.2012 (see also ). The melanocortin receptors expanded early during vertebrate evolution, most likely as a consequence of two rounds of duplication, followed by a single gene duplication event that led to the MC5R []. The most ancient melanocortin receptor is suspected to belong to the sea and river lampreys (MCa and MCb, respectively) [,]. Some authors have proposed that MCa and MCb receptors represent two branches of the ancestral melanocortin receptors and that vertebrate MC1R and MC2R emerged from an ancestral MCa receptor, while MCb gave birth to the MC3R and a MC4/5R gene that later split into MC4R and MC5R [,]. Based on this interpretation, the lamprey’s MCa and MCb have been integrated as outgroups into the MC1-2R and MC3-5R data sets, respectively. Melanocortin receptors seem to have experienced duplication events followed by subfunctionalisation in certain fish species [,,]. In those species with paralogous sequences, we excluded the sequence that was non-orthologous to the mammalian gene copies.The POMC gene has been reported in invertebrates, suggesting that POMC-derived peptides probably appeared before the split between protostomes and deuterostomes [,,,]. Unfortunately, no sequences from invertebrates are available. Lampreys possess two copies of POMC, namely POM and POC, that we did not include into our data set because they likely experienced subfunctionalization []. We kept the paralog sequences of Danio rerio, Gasterosteus aculeatu, Oreochromis niloticus and Oryzias latipes species because they are fully functional copies of POMC that resulted from a latter duplication. Out of the 62 sequences available in Ensembl, 40 POMC sequences were finally kept for the analysis.We translated the nucleotide sequence for the melanocortin receptors, ASIP and AGRP genes into amino acids before aligning them using the software Muscle []. Once aligned, we translated the sequences of amino acids back into nucleotide sequences for the analysis. The alignments of the coding sequences of the POMC and the convertases PCSK1 and PCSK2 were directly exported from Ensembl (ENSGT00390000016811, Euteleostomi.001). The alignments were filtered to exclude partial sequences and incomplete sequences. To perform the analyses of coevolution between pairs of genes, we concatenated the alignments of two genes from the same species (). Full sequences for all the 10 genes were available for only a few vertebrate species and, consequently, the number of genes varies from one species to another. MC1R is the most studied gene with a large number of sequences available across the vertebrates (). However, the availability of MC1R sequences should not bias our results because the sampling level across vertebrates is consistent for each gene and all genes had sufficient numbers of sequences to have a correct power for the analyses performed []. For the coevolution analyses, we further reduced the concatenated alignments to include only the species that had both genes present in Ensembl (), which removed any effects of the larger sampling available for MC1R. [...] We performed all analyses on the species tree of the organisms sampled (). We constrained the topologies of the phylogenetic tree to follow the species tree from the Interactive Tree of Life (http://itol.embl.de/other_trees.shtml) and estimated, for each gene of the melanocortin system, the branch lengths of the species tree using the HKY85 + Gamma model as implemented in PhyML [] from the alignments obtained from each gene. For our analyses, we used the topology of the species phylogenetic tree rather than the trees estimated from each gene. Our goal was to combine the estimation of selective pressure with the coevolution occurring between genes; thus, we needed to have a comparable topology for all our genes. Whenever possible (for the melanocortin receptors), we used the lamprey (Lampetra fluviatilis) as outgroup and a fish species when the sequence of a given gene was not available for the lamprey. [...] We used the maximum likelihood implementation of the model Coev [] to estimate the nucleotide positions that were coevolving between pairs of genes. The analyses were based on the concatenated gene sequences and the species phylogenetic tree (). We measured the score of coevolution for 219,556 pairs of positions for 45 combinations of inter-molecular alignments by comparing the fit of the Coev model against a null model of independent evolution. We characterized pairs of nucleotide positions as coevolving based on the difference in Akaike information criterion (ΔAIC) between the two models. However, assessing coevolution is difficult, and using the standard ΔAIC threshold for model selection can lead to spurious detection of coevolution [,]. We therefore used simulations to correct for bias in the evaluation of the ΔAIC distribution []. To confidently delimit the list of coevolving pairs for a given data set, we used the phylogenetic tree and branch lengths estimated from the concatenated alignments to simulate one alignment of 200 base pairs for each data set based on the independent JC69 model of substitution, as implemented in the package “evolver” from the PAML software []. For each pair of positions from the simulated alignments, we calculated the ΔAIC between the Coev model and the independent model that was used to simulate the data (i.e., JC69). We considered the 97.5th, 95th and 90th percentile values of these simulated ΔAIC distributions (i.e., one per pair of genes) as the thresholds to accept that two sites were coevolving. We used these three thresholds to assess the robustness of the coevolution measures and investigate the strength of the effect in our analyses. In total, we evaluated the score of coevolution of 40,000 simulated pairs of nucleotide positions for each inter-molecular analysis, which amounts to 1,800,000 pairs of positions analyzed. […]

Pipeline specifications

Software tools MUSCLE, iTOL, PhyML, CoEv, PAML
Applications Phylogenetics, Protein structure analysis, Nucleotide sequence alignment
Organisms Homo sapiens