Computational protocol: Biochemical and genetic analyses of N metabolism in maize testcross seedlings: 2. Roots

Similar protocols

Protocol publication

[…] Statistical analysis was implemented in R statistical program (RCoreTeam ) as described in Trucillo Silva et al. (). Ggplot2 (Wickham ) and GGally (Schloerke et al. ) R packages were used for initial analysis of the raw data. First, a univariate analysis, where a single variable is fitted in a model, followed by a multivariate approach, where multiple variables are analyzed simultaneously, was performed to comprehend the relationship among the variables. Then, based on a jackknife resampling strategy, outliers in the dataset were identified as described in Trucillo Silva et al. (). The main procedure consists on fitting a statistical model n times, systematically omitting one observation from the dataset, followed by the prediction of random effects for a subset of the most consistent genotypes each of the n times. The mixed model was fitted with ASReml R package (Butler et al. ) and correspondent mixed model equations were solved for the prediction of random effects and estimation of fixed effects.The statistical model can be represented as follows:y=Xb+Zu+e,where y denotes a n × 1 vector of observed response values, b is a p × 1 vector of fixed effects, X is a n × p design matrix, u is a q × 1 vector of random effects,Z is a n × q design matrix, and e being the error term.The following assumptions were used: E (u) = 0, E (e) = 0, Cov (u, e) = 0, and Var (u) = G and, Var (e) = R. The G matrix had a compound symmetry structure on the genotype levels and R matrix is a diagonal matrix with different values for each set, allowing non-constant variance across sets. The response variable was the activity of the enzyme and the metabolite concentration, respectively. Set, the light replicate and plate were included as fixed effects in the model (where replicate and plate are nested in a set), and the check genotype effect was included as a continuous covariate. Finally, the genotype was included as a random effect in the linear model. Several genotypes were discarded depending on the trait (e.g., for both AlaAT and NR five genotypes were removed). Furthermore, one and four complete sets of data were removed for glutamate and nitrate, respectively, due to the contamination of samples and very low accuracy in the estimations.Significance of genetic variance was calculated based on log-likelihood ratio test by comparing models with and without the TC random effect. Correlations were calculated among BLUP values for each pair of traits and significance was adjusted after the Bonferroni correction for multiple comparisons. Repeatability was derived from variance estimations from ASReml. The variance components were estimated for each different set. As a result, different values of repeatabilities were estimated and partial estimates were averaged. Path coefficient was performed as described by Wright () and Trucillo Silva et al. ().The studied traits followed Gaussian distribution as judged from the similarity of mean and median values along with skewness estimates (Supplementary Material 1). [...] TC materials were genotyped with 5306 single nucleotide polymorphism (SNP) markers by the Beijing Genomics Institute. Physical and genetic positions of the markers were determined and genetic maps were created with R/qtl (Broman et al. ). Recombination fractions were estimated and the Kosambi mapping function was implemented to calculate genetic map distances (Kosambi ). In addition, mapping distances were adjusted to compare the results with previous investigations. The expansion factor was determined based on the following equation: ∝=j2+(2i-1)/2i, where j corresponds to the number of generations of intermating including the two generations for generating the F2, and i is the number of inbred generations after intermating (Teuscher et al. ).The real map was 11,265.25 cM and map distances were reduced by a factor of 6.5 to estimate the adjusted F2 map. The final adjusted map was 1733.12 cM length with an average spacing between markers of 0.33 cM, while the maximal spacing between markers was nearly 7 cM, on chromosome 6. With regard to physical distance, the length of the total genome was 2051.75 Mb, with the biggest gap between markers of 69.80 Mb length (located on chromosome 2). On average there was a marker positioned every 400 Kb. [...] QTL Cartographer (Basten et al. ) was utilized to detect associations between phenotypes and genotypes. Single-marker analysis, linear regression analysis and composite interval mapping (CIM) was implemented. Zmap (model 6) was performed for CIM, using the ten most significant marker cofactors identified by forward and backward regression. QTL were scanned at intervals of 1 cM and at every marker while cofactors located within a window of 10 cM of the scanned position were excluded from the analysis. To determine LOD score thresholds of 5%, and significant QTL, 1000 permutations were performed for every trait. Two nearby QTL were considered as different when LOD peaks were localized 20 cM or greater apart. Effects of QTL are expressed relative to the B73 allele, where an effect with a positive sign represents an increasing allele from B73 and the one with a negative sign denotes an increasing allele from Mo17.Multiple interval mapping (MIM) analysis was performed by fitting previously identified QTL from CIM analysis, and parameters were re-estimated and positions refined. All pairwise interactions between QTL in every model were examined for each trait. The significance was determined based on the information criterion: IC (k) = − 2 (log (L) − kc (n)/2), where the penalty function corresponds to: c (n) = log (n) and a threshold of 0.0 was used (Basten et al. ). The proportion of the total phenotypic variance associated with each model was estimated.Candidate genes annotated on corresponding 1-LOD QTL confidence interval regions were examined from MaizeGDB (Lawrence et al. ) and Phytozome (Goodstein et al. ). Those candidate genes directly related to N-metabolism based on descriptions in model species, such as rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), were proposed for further studies. Several other candidate genes may be promising candidates for further investigations, including transcription factors; however, they were not considered due to the difficulties to ascertain a direct relationship with N-metabolism in maize based on available descriptions. […]

Pipeline specifications

Software tools Ggplot2, R/qtl, QTL Cartographer
Applications Miscellaneous, WGS analysis
Organisms Zea mays, Arabidopsis thaliana
Chemicals Alanine, Aspartic Acid, Glutamic Acid