Computational protocol: Effects of Mecp2 loss of function in embryonic cortical neurons: a bioinformatics strategy to sort out non-neuronal cells variability from transcriptome profiling

Similar protocols

Protocol publication

[…] Raw reads were processed with Trimmomatic [] in order to remove low quality nucleotides and adapters. The minimum phred quality score for bases was set to 35 and only reads with a minimum length of 25 bp were retained after trimming. The high quality reads were aligned against the Mus musculus reference genome sequence (GRCm38) with TopHat [] version 2.0.9. The resulting alignment files were used as input for HTSeq-count [] version 0.5.4p2 together with the GRCm38 annotation file to calculate gene expression values (read counts). Genes with very low read counts or those that were too variable among the replicates were removed with HTSFilter []. Differential expression analysis was performed in R with the package TCC [] in combination with edgeR []. The coordinates of the intergenic region between Irak1 (ENSMUSG00000031392) and Mecp2 (ENSMUSG00000031393) and the number of the mapped reads in each sample were obtained with bedtools (v2.17.0).Expression profiles of the differentially expressed genes were used to perform a K-mean clustering with MeV [] and 7 clusters were identified by using Pearson Correlation as distance metric.Gene Ontology (GO) and functional annotation enrichment analysis was carried out using Database for Annotation, Visualization and Integrated Discovery (DAVID) bioinformatics resources 6.7 (http://david.abcc.ncifcrf.gov/) maintained by National Institute of Allergy and Infectious diseases (NIAID), NIH. We used both DAVID web interface as well as custom made Python script implementing DAVID-web services in our in-house built Transcriptator software [].Transcriptator helps in determining automatic and reproducible GO and functional annotation results for the differentially expressed list of genes derived from transcriptomic analysis of data. Currently, DAVID provides annotation for 40 different categories. In our computational pipeline we includes GO terms, protein-protein interactions, protein functional domains, bio-pathways, sequence general features and gene functional summaries. Using DAVID web application, we exploited functional annotation enrichment as well as functional annotation clustering tools to obtain clusters of differentially expressed genes based on common functionalities. For the GO and functional annotation enrichment, we defined a stringent Ease score (p- value) ≤0.05 and Count ≥5, which is basically a modified Fischer exact test p-value to examine more conservatively the enrichment situation.For neuronal specific genes, we carried out functional annotation clustering based on the algorithm in DAVID, which hypothesizes that similar annotations should have similar gene members. It integrates the Kappa statistics to measure the degree of common genes between two annotations and fuzzy heuristic clustering to classify groups of similar annotations according to kappa values. In easier terms, the higher the number of shared annotations terms, the greater the probability that genes will be grouped together [, ].For the functional annotation clustering implemented algorithm and utilized in DAVID web application we selected a stringent cut-off Ease score ≤0.05 and an enrichment score value ≥ 1. We also carried out GO annotation analysis through AmiGO 2 (version: 2.2.0 amigo2b) with GO database release 2015-06-06. Jvenn software was used to create Venn diagrams []. […]

Pipeline specifications

Software tools Trimmomatic, TopHat, HTSeq, HTSFilter, edgeR, BEDTools, DAVID, Transcriptator, jvenn
Applications RNA-seq analysis, Transcription analysis
Organisms Mus musculus, Homo sapiens
Diseases Nervous System Diseases, Rett Syndrome