Computational protocol: Genetic origins of the Minoans and Mycenaeans

[…] Modern human contamination of the mitochondrial DNA was assessed using the software schmutzi which takes into account that the consensus sequence should be reconstructed from reads showing characteristics of ancient DNA and originating from a single individual (). We assessed contamination by examining heterozygosity on the X-chromosome in five males (which possess only one copy of the X chromosome) using ANGSD (, section 3); this was in the range of 0.3–4%. Indirect evidence that the females in our dataset (for which X-chromosome based contamination estimation is impossible) are authentic is furnished by their clustering with male samples and distinctiveness from present-day Greek or central European populations that may have possibly contaminated them (). We also computed f4-statistics of the form f4(Males, Females; Test, Chimp) for populations that had both male and female individuals for all ancient or present-day Test populations in our dataset. If female samples were substantially contaminated from a source related to Test these statistics would be significantly negative; however we find that the Z-score of these statistics is −1.6smartpca program of EIGENSOFT, using default parameters and the lsqproject: YES and numoutlieriter: 0 options. PCA was performed on 1,029 present-day West Eurasians and 334 ancient samples were projected (); Upper Paleolithic individuals prior to the appearance of the Villabruna cluster plot in the middle of present-day West Eurasian variation and are not shown. [...] ADMIXTURE analysis of the HO dataset was performed after pruning for linkage disequilibrium in PLINK, with parameters indep-pairwise 200 25 0.4, after which 299,971 SNPs were retained. Twenty replicates of the analysis were performed with different random seeds, and the highest likelihood replicate for each value of K was retained. We show the K =2 to K=17 results for the 351 ancient and 30 Modern Greek samples in . [...] f3 and f4-statistics were computed in ADMIXTOOLS using programs qp3Pop, qpF4ratio with default parameters, and qpDstat with f4mode: YES. Standard errors were computed with a block jack-knife. When an ancient population was the target for f3-statistics we set inbreed: YES parameter, as our data are represented by pseudo-haploid genotypes which introduce artificial genetic drift that masks the negative signal of admixture. [...] The ancient samples have low coverage (median 0.87×) and thus diploid genotypes cannot be reliably assessed for them. However, we can use the low coverage data to compute allele frequencies in all individuals and the Bronze Age Aegean using likelihood approach. We then sample from the posterior distribution of the genotypes g given the read counts r of the reference allele and t of the total reads covering a site. We took 100 random genotype samples per individuals and submitted them to HIrisPlex, obtaining an estimate of the uncertainty of phenotype inference (, section 4; ). […]

