Computational protocol: Domesticated, Genetically Engineered, and Wild Plant Relatives Exhibit Unintended Phenotypic Differences: A Comparative Meta-Analysis Profiling Rice, Canola, Maize, Sunflower, and Pumpkin

Similar protocols

Protocol publication

[…] To standardize data from different traits, we followed a procedure based on . The method consists in taking all the values of a single trait from low to high, and normalizing between zero and one. Outlier data points were identified using the approach. In this approach, a multivariate detection method (Cook’s distance) is used to calculate the distance among all data points, and then the data points that do not fall into the general model are identified as “influential data points” or outliers. Given the potential biological meaning of outliers (extreme phenotypes), we decided to investigate the experimental origin of each data point before removing it from the database. We considered that the only biologically meaningful outliers would be those which corresponded to common garden experiments of the domGE with their domNGE isogenic lines, in which case, and despite the outlier category of the data point with respect to the general model, we did not remove these data points from the rest of the analysis. This process was performed for all traits and all crops. As we mentioned before, in most cases the genetic modification is performed in domesticated lines, therefore we decided to separate the three categories in all crops with the labels: “wild” for wild relatives, “domNGE” for domesticated organisms that have not gone through a GE process, and “domGE” for those which have been genetically modified to show new traits.To determine statistical differences among wild, domNGE, and domGE categories within species, we used a Generalized Linear Model (GLM). In the cases where the p-value was less than 0.05, we carried out a Glht (Tukey) as a post hoc test in the R Multcomp package (). A graphic representation of the data was constructed as a Spider Chart using R Fsmb package (). In addition, to determine differences between categories (wild, domNGE, and domGE) within species, we conducted a Discriminant Analysis (DA) with the R MASS package () using the genotypes as categories and the values of each trait as predictor variables. To test the significance of differences between categories of the DA per crop, we conducted a follow-up Multivariate Analysis of Variance (MANOVA). Finally, we delimited groupings by drawing 95% confidence interval ellipses around the centroids using the ggplot2 R package (). All the analyses were conducted in R program (version 1.17.15) () and all the scripts utilized for the analyses are available online at https://github.com/LANCIS-escalante-lab/plant_phenotype_metaanalysis. […]

Pipeline specifications

Software tools multcomp, Ggplot2
Applications Miscellaneous, GWAS
Organisms Oryza sativa, Zea mays, Homo sapiens