Computational protocol: Y-chromosome phylogeographic analysis of the Greek-Cypriot population reveals elements consistent with Neolithic and Bronze Age settlements

Similar protocols

Protocol publication

[…] We examined the population structure within Cyprus by using four approaches: genetic distance, spatial autocorrelation analysis, spatial analysis of molecular variance (SAMOVA), and analysis of molecular variance (AMOVA). First, Nei and Takezaki’s genetic distance [] between Cypriot Y-STR haplotypes were estimated with Arlequin [] and plotted by multidimensional scaling (MDS) using R []. Second, Cypriot Y-STR haplotypes and their geographical coordinates were used to detect possible patterns of isolation by distance within Cyprus through an autocorrelation spatial analysis with even distance classes of 15, 25, and 50 km with the GENALEX software []. Third, Y-STR haplotypes and Y haplogroup frequencies were subject to a SAMOVA in order to examine the genetic variance among the six Cypriot districts []. SAMOVA implements an objective analysis of the genetic variance to search for geographically homogeneous groups and those differentiated from each other by the highest proportion of variance. We then attempted to identify geographic locations of the sharpest gradients of genetic variation in a manner independent of the six administrative districts by means of a SAMOVA analysis of the STR haplotype data. For this, we partitioned Cyprus into 38 areas of equal sample size of 15 haplotypes. Numbers of areas per district are: Paphos: 7, Ammochostos: 7, Kyreneia: 3, LarnaKa: 4, Limassol: 11, Nicosia: 6. Mean distance between centers of two adjacent areas is 10.09 ±3.93 km. We then ran SAMOVA 2.0 from K = 2 to K = 10 groups, K = 15, and K = 20 groups.Last, Y-chromosome haplogroup frequencies were used to test four historical models through an analysis of molecular variance (AMOVA) []. In order to test if the proportion of variance between the different district populations under study was asymmetrically distributed in Cyprus, we started with an AMOVA analysis considering the six populations as one group. We then tested four groupings according to three models of settlement of Cyprus: coastal versus inland (i.e., more than 5 km from the seashore); earliest (Nicosia, Pafos, Limassol) versus Bronze Age occupation sites (Ammochostos, Larnaka, Kyreneia); and two submodels of the arrival of the Philia phase (4400–3700 y BP) with its distinctive pottery style, Red Polished (Ammochostos, Larnaka versus the rest; and Ammochostos, Larnaka versus Kyreneia and the rest) (Table  and Additional file : Table S1). In order to assess the archeological geographic subdivision between the Pottery Neolithic and Early Bronze Age settlements (densely populating Northwest/West/South Cyprus) from the Late Bronze Age settlements (East/Northeast Cyprus), we a priori divided the island into two regions: (1) Kyreneia/Nicosia/Pafos/Limassol versus (2) Ammochostos/Larnaka and performed 2 × 2 table chi-square comparisons of frequency distributions of specific Y-chromosome haplogroups. [...] We estimated the time of expansion (TIMEX) of E-V13, G2a-P15, I2-M423, J2a-M67, J2b-M12, and R1b-M269 lineages in Cyprus and their time of divergence (TD) from Anatolian, Balkans, Greek, and Levantine sources. TIMEX and TD date lineages assuming that the observed variance has arisen indigenously within a given population as a result of a unique founder-based migration from an outside source. In practice, this means measuring the variation between the source and sink population assuming that it has accumulated since the time of settlement.For TIMEX, variation was measured from the mean variance of the abovementioned 11 microsatellites. TD was based on the square difference between the means of allele size incorporated in the genetic distance denoted (δμ)2 [] and implemented in POWERMARKER []. Under the assumption of a single population splitting into two fully isolated groups, this genetic distance is supposed to increase linearly with time since divergence as (δμ)2 = 2ωτ, where ω is the mutation rate and τ is the number of generations since isolation. Note that these approaches are sensitive to both multiple founders during a particular migration as well as subsequent population gene flows, both of which inflate STR variance and the age estimate of the event. Since the choice of mutation is debatable (e.g., [, ]), we used two mutation rates for STR: 0.00069 per STR per generation [] and pedigree mutation rate of 0.0021 with 95 % confidence interval limit (CIL) of 0.0006–0.0049 % [] to set lower and upper bounds. We assumed a generation time of 25 years. […]

Pipeline specifications