Computational protocol: Placental Genome and Maternal-Placental Genetic Interactions: A Genome-Wide and Candidate Gene Association Study of Placental Abruption

Similar protocols

Protocol publication

[…] Univariate logistic regression model was used to estimate odds ratio (OR) and 95% confidence interval (95% CI) relating each SNP with risk of PA, in the genome-wide and candidate gene analyses. For multiple testing correction, a false discovery rate (FDR) procedure was used . Functions and functional relationships of genes represented by the top 200 genome-wide SNPs were obtained by pathway analysis using the Ingenuity Pathway Analysis (IPA, Ingenuity Systems, www.ingenuity.com) software. Gene-enrichment network score based on a modified Fisher's exact test were calculated to rank biological significance of networks in relation to PA.In multivariable analyses, we applied penalized logistic regression models to identify sets of SNPs that are jointly associated with the risk of PA. These penalized approaches have previously been applied in the context of GWAS and have shown promising results –. These methods allow the selection of relevant variables or groups of variables and the estimation of their regression coefficients . The number of selected variables is guided by a penalty parameter: the larger the parameter, the smaller the selected subset. A 20-fold cross-validation approach was performed to select the penalty parameter and the value yielding the smallest prediction error was used. For the genome-wide SNP analysis, we applied a lasso regression . One characteristic of lasso regression is that it selects a single variable among a set of correlated variables. To circumvent this, SNPs in high linkage disequilibrium with a selected SNP were also considered using an r2 threshold of 0.8 within 500 kb. For SNPs in the candidate gene analyses, a group penalty approach was used to account for the membership in a gene . Furthermore, we considered a bi-level selection approach that uses a composite minimax concave penalty , to select candidate genes associated with PA as well as relevant SNPs within those genes. These penalized regression methods do not accommodate missing values and the software BEAGLE version 3.3.2 was used to impute missing genotypes.For weighted genetic risk score (WGRS) analyses , a 10-fold cross-validation procedure was implemented to protect against model over-fitting, which arises from using the same data to estimate the regression parameters used in computing WGRS and to evaluate the association between PA risk and WGRS . The procedure consisted of randomly partitioning the data into 10 equal size subsamples, using nine of the subsamples as training set and the left-out one as validation set, with each subsample being used in turn as a test set. For each fold, a multivariate logistic regression model was fit on the training set using the SNPs selected from multivariate analyses. A weighted approach was then used to compute Genetic Risk Scores (GRS) in the validation set by multiplying the number of risk allele for each locus by its associated effect size estimated from the training set. Once the WGRS were obtained for all individuals, the subjects were categorized into four groups defined by the quartiles in the control. A logistic regression model was then fit to examine the association of the WGRS with PA risk using the lowest quartile (Group 1) as a reference and adjusting for infant sex and population admixture. This 10-fold cross-validation procedure was repeated 1000 times to account for the variability in randomly partitioning the data into subsamples. The receiver operating characteristics (ROC) curve for each of the replicates was evaluated. The estimated effect sizes and AUCs over the 1000 replicates were used to obtain the respective point estimates and confidence intervals.Maternal-placental interaction analyses (for candidate genes and imprinted regions) were performed using a multinomial model proposed by and implemented in the EMIM and PREMIM software tools . The method requires some biological assumptions, such as Hardy-Weinberg equilibrium (HWE), random mating, and rare disease. For each SNP, four models were considered and a model selection procedure based on the Bayesian information criterion (BIC) was applied. The four models correspond to allele effects operating only at the fetal level (Model F), allele effects operating only at the maternal level (Model M), an additive effect of maternal and fetal effects (Model M+F), and a model that includes a maternal-placental interaction effect (Model I). For the latter, we used a parametrization that introduces two interaction terms capturing incompatibility between maternal and placental genotypes; the interaction effects operate when the infant has one copy and the mother has either zero or two copies of the risk allele , . Maternal imprinting effect, which corresponds to the factor multiplying the disease risk if the infant inherits a risk allele from the mother, was tested using a likelihood ratio test , .Adjustment for the first four principal components was done for all univariate and multivariable logistic regression models to take into account population stratification. The various statistical analyses were conducted using a combination of software tools: PLINK, PREMIM, EMIM, Haploview, and R. As for multivariable approaches the R packages ncvreg and grpreg were used , . The pathway analyses were conducted using the Ingenuity Pathway Analysis (IPA) software. […]

Pipeline specifications

Software tools IPA, BEAGLE, PLINK, Haploview
Application GWAS
Organisms Homo sapiens
Diseases Abruptio Placentae, Genetic Diseases, Inborn