*library_books*

## Similar protocols

## Protocol publication

[…] The algorithms we discuss accept the number of populations, K, and an M × N genotype matrix, G as input:
(3)
G
=
g
11
g
12
⋯
g
1
N
g
21
g
22
⋯
g
2
N
⋮
⋮
⋱
⋮
g
M
1
g
M
2
⋯
g
MN
where gli ∈ {0,1,2} representing the number of copies of the reference allele at the lth locus for the ith individual, M is the number of markers (loci), and N is the number of individuals. Given the genotype matrix, G, the algorithms attempt to infer the population allele frequencies and the individual **admixture** proportions. The matrix P contains the population allele frequencies:
(4)
P
=
p
11
p
12
⋯
p
1
K
p
21
p
22
⋯
p
2
K
⋮
⋮
⋱
⋮
p
M
1
p
M
2
⋯
p
MK
where 0 ≤ plk ≤ 1 representing the fraction of reference alleles out of all alleles at the lth locus in the kth population. The matrix Q contains the individual admixture proportions:
(5)
Q
=
q
11
q
12
⋯
q
1
N
q
21
q
22
⋯
q
2
N
⋮
⋮
⋱
⋮
q
K
1
q
K
2
⋯
q
KN
where 0 ≤ qik ≤ 1 represents the fraction of the ith individual’s genome originating from the kth population and for all i, ∑kqki = 1. Table summarizes the matrix notation we use. [...] We generate simulated genotype data for a variety of problems using M = 10000 markers, and varying N between 100, 1000, and 10000; K between 2, 3, and 4; and α between 0.1, 0.5, 1, and 2, for a total of 36 parameter sets. For each combination of N, K, and α, we generate the ground truth P from a uniform distribution, and Q from a Dirichlet distribution parameterized by α. Then, we draw a random genotype for each individual using the binomial distribution in Equation 11. We estimate P and Q using only the genotype information and the true number of populations, K. We repeat the experiment 50 times drawing new, P, Q, and G matrices each time. Finally, we record the performance of Admixture using the published tight convergence threshold of ε = 1e-4[] and a loose convergence threshold of ε = MN×10-4; the least-squares algorithm using an uninformative prior (α = 1) and ε = MN×10-4, and the **FRAPPE** EM algorithm using the published threshold of ε = 1. For reference, we also include the least-squares algorithm with informative prior (known α) with convergence threshold of ε = MN×10-4. In all experiments, Admixture’s performances with the two convergence thresholds were nearly identical and we only report the results for ε = MN×10-4, resulting in shorter computation times. We used a four-way analysis of variance (ANOVA) with a fixed effects model to reveal which factors (including algorithm) contribute more or less to the estimation error and computation time. […]

## Pipeline specifications

Software tools | ADMIXTURE, frappe |
---|---|

Application | Population genetic analysis |