Computational protocol: Genome wide association and genomic prediction for growth traits in juvenile farmed Atlantic salmon using a high density SNP array

Similar protocols

Protocol publication

[…] DNA from the 712 fish was extracted using the DNeasy-96 tissue DNA extraction kits (Qiagen, Crawley, UK) and then genotyped for the Affymetrix Axiom SNP array containing ~132 K validated SNPs [] (http://www.affymetrix.com/support/technical/datasheets/axiom_salmon_genotyping_array_datasheet.pdf). Starting with these validated SNPs, filtering of SNP data was performed using the Plink software [] to remove individuals and SNPs with excessive (> 1 %) Mendelian errors and SNPs with minor allele frequency (MAF) < 0.05 in this dataset. A total of 111,908 remaining SNPs were retained for 622 fish (534 offspring, 28 sires and 60 dams). The phenotypic sex of the offspring was unknown and, therefore, the Y-specific probes on the array were used to predict the genetic sex of the fish based on the putative sex determining gene [], as described in Houston et al. []. [...] Genetic parameters for the weight and length traits were tested fitting animal as a random effect. The estimation was performed using a REML analysis assuming the following model:1y=Xb+Zu+ewhere y is the observed trait, b is the fixed effect of sex, u is the vector of additive genetic effects, e is the residual error and X and Z the corresponding incidence matrices for fixed effects and additive effects, respectively. The covariance structure for the genetic effect was calculated either using pedigree (A) or genomic (G) information (i.e. u ~ N(0, Aσa2) or N(0, Gσa2)). Hence, the narrow sense of heritability was estimated by the additive genetic variance and total phenotypic variance, equaling to:2h2=σ2a/σ2pwhere σ2a is the additive genetic variance and σ2p is the total phenotypic variance which is a sum of σ2a  + σ2e.The analysis was implemented using the ASReml 3.0 software []. The genomic relationship required for the analysis was calculated using the Genabel ‘R’ package [] and method of VanRaden [], and then inverted applying the standard ‘R’ function. [...] Based on the result of the GWA analysis, the SNPs surpassing the relaxed significance threshold (P < 0.005 in model (1), ~ top 0.5 % of markers) were chosen to identify those located within or proximal to genes. Firstly, the flanking sequence of all the significant markers were aligned (using blastn) with an Atlantic salmon fry transcriptome database from RNA-seq of salmon fry in a separate study in which a large proportion of the SNPs on the array were discovered (described in Houston et al. []). Only markers whose flanking sequences that matched exactly with reference transcriptome database except at the SNP position was selected. These transcripts were used to align (using blastx) with human (Homo sapiens), mouse (Mus musculus), and zebrafish (Danio rerio) peptide reference database respectively (downloaded from http://www.ensembl.org/index.html; May 2014), from which a stringent criterion of e-value ≃ 0 were used as evidence for homology. Secondly, for each unique peptide in each of the species, the corresponding gene id, associated gene name, chromosome position, and gene ontology (GO) were retrieved from ensembl biomart database (retrieved from http://www.ensembl.org/biomart; Jun. 2014) respectively. The corresponding chromosome of SNP markers were identified by aligning the marker and its flanking sequence with salmon reference genome sequence (AKGD00000000.4) and existing LG mapping []. […]

Pipeline specifications

Software tools PLINK, GenABEL, BLASTN, BLASTX, BioMart
Applications Genome annotation, RNA-seq analysis, GWAS
Organisms Danio rerio, Mus musculus, Salmo salar, Homo sapiens