Computational protocol: Validation of an updated Associative Transcriptomics platform for the polyploid crop species Brassica napus by dissection of the genetic architecture of erucic acid and tocopherol isoform variation in seeds

[…] Clustering and dendrogram visualization on SNP data was performed by an r script developed in‐house. r package ‘phangorn’ was used for generating a distance matrix with the JC69 model (Schliep, ). [...] Pairwise LD was calculated and heat maps were produced for each individual chromosome, and these values were then used to calculate the mean LD across the genome. SNPs were removed from the analysis if they were not confirmed by TNDH population (Qiu et al., ) that assigned to the A or C genome, and if their minor allele frequency was below 0.01. A single SNP was selected at random from each CDS model to reduce the effect of many linked SNPs in the same gene. Pairwise R 2 LD matrices and heat maps were calculated for each chromosome using the r package ldheatmap 0.99‐2 (Shin et al., ). [...] Association analysis for SNPs and GEMs was performed using r, as previously described (Harper et al., ; Sollars et al., ), with modifications. In order to deal with the greatly increased sizes of the data sets, psiko (Popescu et al., ) was used for Q‐matrix generation and the gapit r package was used with a mixed linear model (Lipka et al., ) for GWAS analysis. For Manhattan plots of SNP associations, SNP markers were filtered to include only those with minor allele frequencies of > 0.01: markers that could be assigned with confidence to the genomic position of the CDS model are rendered as dark points and markers that could not be assigned with confidence were rendered as pale points. For GEM association, CDS models were filtered prior to regression to include only those with mean expression across the panel of >0.4 RPKM. The association between gene expression and traits was calculated by fixed‐effect linear model in r, with RPKM values and the Q matrix inferred by psiko as the explanatory variables, and with trait score as the response variable. R 2 regression coefficients, constants and significance values were outputted for each regression. Genomic control (Devlin and Roeder, ) was applied to the GEM analysis to correct for spurious associations, with P‐value adjustment applied when the genomic inflation factor (λ) was observed to be greater than 1. […]

Pipeline specifications

Software tools Phangorn, LDheatmap, PSIKO, GAPIT
Applications Phylogenetics, Population genetic analysis, GWAS
Organisms Brassica napus
Chemicals Vitamin E, Tocopherols