Computational protocol: Comparative phylogenetic analyses uncover the ancient roots of Indo-European folktales

Similar protocols

Protocol publication

[…] To establish how far back shared folktales could be traced in Indo-European oral traditions, we mapped the evolutionary histories of the most phylogenetically conserved tales identified from the D and autologistic analyses using two models of discrete trait evolution implemented in Mesquite v. 3.02 []: (i) a Markov k-state one parameter model (Mk1), which estimates a single instantaneous rate of change for both gains and losses, given the distribution of the focal trait, a tree and set of branch lengths; (ii) an asymmetrical Markov k-state 2 parameter model (Mk2), which estimates separate rates for gains and losses on the tree. The most suitable model for each tale was selected on the basis of an asymmetrical likelihood ratio test. To incorporate uncertainty in Indo-European phylogenetic relationships and branch lengths, the tales were traced on every tree contained in our sample of 1000 Bayesian language phylogenies. Ancestral states were inferred for the nodes contained in a majority-rules consensus tree, which was rooted using Hittite as an outgroup. As no data on Hittite magic tales were available, trait states were coded as missing so that they did not bias the outcome of the analyses. The likelihood of any given tale having existed in a hypothetical ancestral population was calculated by estimating the average likelihood of the tale’s presence in the corresponding node across the tree, multiplied by the posterior probability of the node itself (i.e. its frequency in the tree sample; ). Figure 2.An additional set of Bayesian analyses were carried out on tales inferred as being potentially present in the populations’ hypothetical last common ancestor, ‘Proto-Indo-European’. We targeted this node for further investigation for two reasons: firstly, to test the support for the deepest reconstructions suggested by the analyses described above; and secondly, to control for the higher degree of phylogenetic uncertainty toward the root of the Indo-European language tree, which can be more effectively addressed within a Bayesian framework. Instead of calculating transition rates that maximize the likelihood of a trait distribution for each individual tree and then averaging the likelihood of it being present or absent at a particular node across the tree sample, the Bayesian approach estimates a posterior probability of ancestral states that integrates uncertainty about both transition rates and phylogenetic relationships simultaneously [,]. The posterior probability is obtained by recording ancestral states at regular intervals during a MCMC simulation, in which the trees and transition rates used to map the trait are sampled in proportion to their probabilities. We carried out the analyses using the Multistate model implemented in the software package BayesTraits v. 2.0 [], using the same sample of 1000 Indo-European language trees and data on tale distributions from the ATU Index [] as our previous analyses. Two sets of analyses were performed. The first estimated the posterior probability of each tale being present in Proto-Indo-European using the ‘most recent common ancestor’ command. The second analysis tested the relative support for each tale being present or absent by ‘fossilizing’ (i.e. fixing) the node in each state, and comparing the likelihood of the two models using Bayes Factors []. All the analyses employed uniform priors, the range of which was determined empirically following a maximum-likelihood analysis. The MCMC chains ran for 1 000 000 iterations, every 1000th of which was sampled into the posterior distribution following a burn-in period. […]

Pipeline specifications

Software tools Mesquite, BayesTraits
Application Phylogenetics
Organisms Homo sapiens