Computational protocol: Reconstructing the Indian Origin and Dispersal of the European Roma: A Maternal Genetic Perspective

Similar protocols

Protocol publication

[…] In order to locate the Iberian Roma in the context of other European Roma and their corresponding host populations, a database of 1,890 hypervariable region I (HVR-I) sequences (positions 16090 to 16365) was built from previously published studies (hereafter referred to as Roma-host database). In addition to the 138 Portuguese and 76 Spanish Roma from this study, the Roma-host database contained other sequences gathered from the literature: 39 Spanish Roma , , 232 Bulgarian and 18 Lithuanian Roma , 69 Polish Roma , and 205 Hungarian Roma . To cover the corresponding European host populations, we collected 118 Portuguese individuals (unpublished data), 68 Spanish , 141 Bulgarian , 162 Lithuanian , 413 Polish , and 211 Hungarian . The Bulgarian Roma populations from Gresham et al. were grouped according to the original paper classification (“Bulgaria 1” stands for Roma groups who settled early in Bulgaria, whereas “Bulgaria 2” and “Bulgaria 3” stand for Roma groups settled in Bulgaria coming originally from Wallachia/Moldavia in the 17th–18th centuries and late 19th century respectively).Intrapopulation genetic diversity parameters such as number of different sequences (K), sequence diversity values (Ĥ) , number of polymorphic sites (S) and nucleotide diversity (π) , were calculated for the HVR-I using Arlequin software v3.1 . Additionally the weighted intralineage mean pairwise differences (WIMP) were also computed, which measures mean pairwise differences within each lineage but weighting for its corresponding frequency . Finally the female effective-population sizes were assessed by the computation of the estimators θπ, θK and θS (θ = 2Nfeμ where Nfe is the female effective-population size and μ is the mutation rate). Whereas θS is based on the number of segregating sites, θK relies on the observed number of different lineages. Since the mutation rate for the HVR-I should be the same in all populations, differences in θ values reflect differences in the female effective-population sizes among populations .Pairwise differences between populations were represented in a Non-Metric Multidimensional Scaling plot (NMDS) by using STATISTICA 7 package ( with default starting configuration.Population genetic structure was tested through analysis of molecular variance (AMOVA) using Arlequin v3.1 software to shed light on the migration routes that Roma populations may have followed in Europe by comparing country of residence to migration routes.Taking advantage of the phylogeographic information of the mitochondrial sequences and following the same approach as in Mendizabal et al. , admixture between Roma and European host populations was estimated. In addition, several Indian geographic areas were evaluated as possible ancestral homeland of the Roma. Two datasets were compiled for these purposes: the extended database of host European sequences with 5,096 individuals from Iberia, Balkans, Hungary, Poland and Baltic countries (from Additional File 1 in Mendizabal et al. ), whereas Indian sequences were collected from Dubut et al. (n = 3,751, excluding Sri Lanka). Each of the datasets was subdivided into subcontinental regions and the probability of origin at each region was calculated aswhere, n is the number of Roma sequences with matches (≥1) in the whole subcontinental dataset of India; ki, the number of times the sequence i is found in the Roma sample; pis, the frequency of the sequence i in the specific region of India; and pic, the frequency of the sequence i in the whole subcontinental Indian dataset. Standard deviations for each of the estimations were computed as A median-joining network was generated to infer phylogenetic relationships between European Roma and Indian mtDNA lineages (HVR-I, positions 16090–16365) using Network software ( Mutation weights were in accordance with Santos et al. , excluding insertions and deletions. The time to the most common ancestor (TMRCA) of M5a1 subhaplogroup was estimated based on the average number of mutations accumulated from an ancestral sequence as a linear function of time and mutation rate. The age estimates were obtained with Network by considering one transition per 18,845 years in the sequence range of 16090–16365 . […]

Pipeline specifications