Computational protocol: Y-chromosomal diversity of the Valachs from the Czech Republic: model for isolated population in Central Europe

Similar protocols

Protocol publication

[…] Hundred and seventy-three DNA samples of male Valachs from the Czech Republic were analyzed. These samples were divided into two groups because of the important differences in sampling procedure and are consistently referred to separately throughout this article.The first group consisted of 94 samples of unrelated donors (code: VALACH, Moravian Valachs). All donors identified themselves as belonging to the Valach ethnic group in a short interview that was held immediately before DNA sampling in the form of mouth swabs. Only donors whose paternal lineage was present in the region of the Valach country for at 3 generations were included into the study. Informed consent was provided by the donors and no other data (including name, address, etc.) were gathered. The data were rendered fully anonymous.The second Valach sample set consisted of 79 samples (code: VLIN, Moravian Valachs lineages). The sampling process in this case differed significantly from the VALACH sample set. VLIN sample set came from 7 Valach paternal lineages. These were defined primarily by surname, as well as by geographic localization in the Valach region and self-identification of the donors. Although the samples came from broad families, no first, second, third, and fourth degree relatives were included in the study, virtually making this Valach sample set composed of unrelated, non-randomly selected Valachs, carrying 7 different surnames.The control sample set consisted of 273 unrelated male donors from the whole Czech Republic. Donors did not identify themselves as having the Valach origin, however, there was no other information gathered about their ethnicity or origin. Data are available on request and will be submitted to the Y-chromosome haplotype reference database (http://www.yhrd.org/) database.We gathered published samples for Y-short tandem repeat (STR) loci from other populations, concentrating on Eastern European and Balkan populations. Our total set, Moravian Valachs included, consisted of 30 populations encompassing 3244 individuals (). Due to the limitation of the published data, only minimal haplotype loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, and DYS385a/b) were used for the analysis of intra-population statistics computing and the comparison between populations. For detailed analysis, 12 loci haplotypes were utilized, which also included all extended haplotype loci (minimal haplotype loci + DYS437, DYS438, DYS439).The Y-chromosomal STRs of VALACH samples were assessed using PowerPlex® Y System (Promega, Madison, WI, USA). It contained 12 microsatellite polymorphic sites (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DYS438, and DYS439), including all recommended minimal haplotype loci (minHT), and all extended haplotype loci (extHT; SWGDAM recommended loci).CZE samples and VLIN samples were typed using AmpFiSTR Yfiler® PCR Amplification Kit (Applied Biosystems, Carslbad, CA, USA), that included all of the above mentioned loci plus DYS456, DYS458, DYS635, Y GATA H4, DYS448. All 17 markers were used only in evaluating discrimination capacity of Y-STR haplotypes in CZE and VLIN sample sets.Using the Y-STR information, we estimated also the Y chromosomal haplogroups in our samples by the free internet software tool ‘Haplogroup Predictor’ by Whit Athey (http://www.hprg.com/hapest5/) (,). We were aware of the issues present in estimating Y-chromosomal haplogroups from Y-STR frequencies (), thus for the subsequent analyses (median networks) we used only the samples with Hg estimate probability higher than 90%.Genetic intra-population indices were computed in Arlequin v3.1 software (). For construction of median networks, we used Neworks 4.60 (available at http://www.fluxus-engineering.com) (,). We used 12 Y-STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385a/b, DYS437, DYS438, and DYS439) for the network analysis. A reduced median algorithm (r = 2) was followed by a median joining procedure (epsilon = 0) to reduce the reticulation of the networks (). The results from the median joining procedure were post-processed by maximum parsimony calculation to further simplify the final network.The multidimensional scaling analysis was performed in Statistica 9.0 software (StatSoft Inc., Tulsa, OK, USA). […]

Pipeline specifications

Software tools Arlequin, Statistica
Applications Miscellaneous, Population genetic analysis
Organisms Homo sapiens