Computational protocol: Laser Capture and Deep Sequencing Reveals the Transcriptomic Programmes Regulating the Onset of Pancreas and Liver Differentiation in Human Embryos

Similar protocols

Protocol publication

[…] Paired-end sequencing (50 bp) was carried out using an Illumina HiSeq 2000 at the Wellcome Trust Centre for Human Genetics, Oxford, UK. Reads were mapped to the GENCODE 15 transcriptome () using TopHat version 1.4.1 (). Gene-level transcript abundance (read counts and RPKM) was estimated by an algorithm implemented in the Partek Genomics Suite (version 6.6 [6.12.1227]; Partek Inc., St. Louis, MO, USA) (A and S2B). After filtering mitochondrial genes, ribosomal RNAs, and two other multi-locality RNAs (Metazoa_SRP and 7SK) the number of mapped reads varied from zero (for >50% of genes) to >90,000 (e.g., APOB in liver). Differential expression was examined in the R/Bioconductor package EdgeR (version 3.0.8; ) using a generalized linear model (count = tissue + replicate) and the default trimmed mean of M values (TMM) scaled differences in library size (). For comparison with pancreatic hPSC differentiation (), RNA-seq data were retrieved from ArrayExpress, remapped, and quantified as above. PCA was performed on the combined rank-normalized gene-level abundances from both datasets. The mouse RNA-seq dataset (GEO: GSE40823) () was downloaded and remapped to the mm10 genome using STAR (version 2.4.2a; ) with gene-level read counts calculated according to the GENCODE M5 annotation. Human and mouse read counts were combined by biomaRt () using gene i.d. mappings from Ensembl and quantile normalized. Genes were filtered for one-to-one orthologs. [...] Sets of genes enriched between the different LCA-RNA-seq datasets were assessed for GO term enrichment with EdgeR false discovery rates <10−4. Fisher's exact test was applied with the elimination algorithm as implemented in the topGO R package (version 2.12.0). Additional data and annotations were obtained from other Bioconductor R packages (org.Hs.eg.db [2.9.0], GO.db [2.9.0], AnnotationDbi [1.22.6]).Gene-level PCA projected loadings were used to test for GO gene set enrichment employing one-sided Wilcoxon rank-sum tests to test separately for enrichment at both ends of the loading distributions. This was implemented within the topGO framework and used the elimination algorithm to traverse the GO ontologies. [...] The 1,000 genes most differentially expressed in dorsal pancreas (logFC > 0) or hepatic cords (logFC < 0) were loaded into Cytoscape (version 3.2.1.) and used as queries to the iRegulon plug-in (version 1.3, build 1024) (). The default iRegulon parameters search for enrichment of either known motifs or experimental TF binding data within 10 kb of the transcription start sites. Pancreatic analysis was constrained to motif discovery to overcome the relative lack of pancreatic binding data in iRegulon compared with data for hepatocytes. The putative regulators returned from the iRegulon analysis were filtered according to expression in the LCA-RNA-seq datasets (B). […]

Pipeline specifications

Software tools TopHat, Partek Genomics Suite, edgeR, STAR, TopGO, org.Hs.eg.db, AnnotationDbi, iRegulon
Databases ArrayExpress GENCODE GO.db
Applications Genome annotation, RNA-seq analysis
Organisms Homo sapiens, Caenorhabditis elegans, Mus musculus