Computational protocol: Molecular Networks of Postia placenta Involved in Degradation of Lignocellulosic Biomass Revealed from Metadata Analysis of Open Access Gene Expression Data

Similar protocols

Protocol publication

[…] Microarray GSE12540, GSE29656 and GSE69004 datasets were analyzed using GEO2R and customized programming scripts. The literature and for the corresponding datasets were studied to understand the experimental design for performing the statistical analysis. Following options were applied when using GEO2R website A) autodetect option (for log transformation of the data), B) box-whisker plot (samples and value distributions), C) submitter provided annotations (for gene level annotations) and D) Benjamini & Hochberg (False discovery rate correction) for multiple testing correction of p-values was used. Once the samples are grouped according to their experimental conditions, Top 250 option was used for obtaining statistically significant genes expressed in each dataset. We have retrieved Top 1000 differentially expressed genes to understand P. placenta gene expression by using the function “Save all results” and the obtained results were analyzed further. The ranking of differentially expressed genes (Top 250) and statistical analysis in GEO2R internally happens through limma package and the statistically significant genes were ranked based on their corrected p-value (0.05). Thus, obtained statistically significant genes were analyzed by the supplier provided annotations Protein-ID and Best hit BlastP. Gene and protein level annotations of P. placenta MAD-698R were retrieved from MycoCosm (fungal genome repository) , . Custom linux based scripts were written to retrieve the annotations for differentially expressed gene list using KOG, GO and InterPro annotations of P. placenta MAD-698R v1. We have also used other analysis options available in JGI-MycoCosm such as Gene Ontology (GO), , EuKaryotic Orthologous Groups (KOG) and CAZy , for analyzing the results obtained. The differentially expressed gene lists of the respective experimental conditions were compared using Venny 2.1 and Jvenn softwares. We have retrieved sample level FPKM (Fragments Per Kilobase of transcripts per Million) values, P-values and log fold change values from supplementary files provided for GSE84529 dataset, gene list was sorted based on their P-values and top 1000 genes were retrieved and compared among the conditions. The data analysis was similar as explained earlier in our previous studies -.Simultaneously, the datasets were also analyzed using GeneSpring® v.14.8 software. Gene expression datasets were retrieved using the option “Import NCBI GEO experiment” by saving the GEO sample files in the local folder. The experiments were created as generic single color by applying the following preprocessing conditions “Threshold value set at 1.0”, “Normalization using shift 75th percentile”, sample values were log base 2 transformed and baselined to the median of all samples. The experimental conditions were retrieved from the corresponding GEO experiment and literature, was used for the grouping the samples. The samples were filtered using “Probesets by expression” with the parameters set to data filter on normalized data and filter by percentile (upper percentile set to 100.0 and lower percentile 20.0) respectively. Based on the experimental conditions one-way Anova and Moderated T-test was performed respectively. However, for the dataset GSE84529, the samples were retrieved, and the experiment was created without any preprocessing steps. We have performed fold change analysis on the grouped samples using the FPKM values and the transcripts differentially expressed >2.0 were retrieved for the analysis.The detailed step by step workflow used for the metadata analysis of P. placenta gene expression datasets were showed in Fig. .The rapid cellulolytic capacity of the brown rot fungi P. placenta surely will be credited to the potential cellulolytic enzymes and uncompromising Fentons reactions. Apart from these enzymes P. placenta also secretes a large list of aromatic compound degrading and detoxifying enzymes. Statistical analysis of the P. placenta gene expression datasets based on the provided experimental conditions has resulted in 5174, 7519, 6390 and 10,754 differentially expressed transcripts among GSE12540, GSE29656, GSE69004 and GSE84529 datasets respectively (Fig. ). […]

Pipeline specifications

Software tools GEO2R, limma, BLASTP, VENNY, jvenn, GeneSpring GX
Databases MycoCosm CAZy
Application Transcription analysis
Organisms Postia placenta
Chemicals Hydrogen Peroxide, Iron