Computational protocol: Insight into the genetic composition of South African Sanga cattle using SNP data from cattle breeds worldwide

Similar protocols

Protocol publication

[…] Genotypes for four South African Sanga cattle breeds [Afrikaner—AFR (n = 36), Nguni—NGU (n = 50), Drakensberger—DRA (n = 47) and Bonsmara—BON (n = 44)] originated from previous studies [, , ]. They were generated using the Illumina BovineSNP50 BeadChip v2, which features 54,609 SNPs distributed across the bovine genome with an average spacing of 49.9 kb []. These data were combined with genotypes from an additional 394 reference animals representing European taurine cattle i.e. Shorthorn (SH), Hereford (HFD), Simmental (SM), Limousin (LM), Angus (AN) and Holstein (HOL), African taurines i.e. N’Dama (NDAM), Somba (SOM), Kuri (KUR), Lagune (LAG) and Baoule (BAO), African zebu i.e. Ankole-Watusi (ANKW), Boran (BOR) and Sheko (SHK), East African zebu i.e. short-horned zebu (ZEB) and zebu Bororo (ZBO), and Bos indicus i.e. Brahman (BR), Nelore (NEL), Bhagnari (BAG) and Gir (GIR). Samples from these reference individuals were obtained with permission (see []) and were selected based on their land of origin and previous characterization. These samples originated from the following studies: Gautier et al. []; The Bovine HapMap Consortium []; Matukumalli et al. []; Decker et al. []; Gautier et al. []; and Decker et al. []. Additional file : Table S1 provides breed names and acronyms, number of individuals per breed, sampling area, land of origin and references to the original studies from which the samples came from.These data were merged in PLINK [] and autosomal SNPs that were common to all datasets were retained. This resulted in 35,155 SNPs and 548 individuals after removing SNPs with a MAF lower than 0.005, a call rate lower than 0.98 and individuals with more than 5% missing genotypes. [...] Patterns of admixture and relationships among South African Sanga cattle in relation to the 20 reference breeds were determined using principal component analysis [] implemented in the SNP Variation suite (SVS 8.1; Golden Helix Inc., Bozeman, Montana) and variational Bayesian inference as implemented in fastSTRUCTURE []. The data were evaluated for K values ranging from 2 to 20 to evaluate ancestry proportions from K ancestral populations assuming a simple non-informative prior. The Kε∗ and K∅c∗ metrics from fastSTRUCTURE were used to determine the appropriate values of K for the population structure explained by the dataset. The Kε∗ metric is the value of K, which maximizes the log marginal likelihood lower bound and the K∅c∗ metric is the minimum value of K that explains almost all of the ancestry in the dataset. Outputs from fastSTRUCTURE [] were plotted using the GENESIS software []. To further test for evidence of admixture in South African Sanga cattle, ancestry graph [], three-population (f3) [, ] and four-population (f4) tests [, ] implemented in TreeMix [] were also used. The maximum likelihood tree (ancestry graph) [] was first built for all 24 populations (see Additional file : Figure S1), after which, migration events were sequentially added to the tree until no more meaningful increases in the proportion of variance explained were observed (see Additional file : Table S2). […]

Pipeline specifications

Software tools PLINK, ADMIXTURE, TreeMix
Applications Population genetic analysis, GWAS
Organisms Bos taurus, Bos indicus
Chemicals Taurine