Computational protocol: Identification of Endoglin as an epigenetically regulated tumour-suppressor gene in lung cancer

[…] Genome-wide screening was carried out at Genentech, Inc. Cell lines were profiled using the Infinium HumanMethylation450 Beadchip Kit (Illumina Inc., San Diego, CA, USA) as previously described (). Array data have been submitted to the Gene Expression Omnibus database (accession number GSE36216). A single probe mapped to the short CpG island located within the Endoglin gene promoter region. Beta values for probes mapping to CpG island ‘shore areas' (regions flanking CpG islands, up to 2 kb up and downstream) were also analysed, as mounting evidence suggests that shore areas may also directly affect gene expression (; ).For expression analysis, NSCLC cell line RNA was used to generate RNA-seq libraries using the TruSeq RNA Sample Preparation Kit (Illumina Inc.). Processed libraries were sequenced on the Illumina HiSeq 2000 Platform according to the manufacturer's recommendation (Illumina Inc.). In all, 75 × 75 base pair (bp) paired-end data were collected, and RNA-seq reads were aligned to the human genome version GRCh37 using GSNAP (). All short-read sequencing data were evaluated for quality control using the R Bioconductor Shortread package. To normalise for coverage depth, Reads Per Kilobase of transcript per Million mapped reads (RPKM) values were generated. RPKM expression values for Endoglin were then visualised in Excel (Microsoft, Redmond, WA, USA) and compared against methylation beta values at the Infinium probe site within the Endoglin gene. [...] To determine the expression levels of 20 EMT-related genes (), cell line cDNA was analysed using a Taqman Low Density Array (TLDA, Applied Biosystems) at Genentech. Total cDNA (200 ng) was amplified using Taqman Universal PCR MasterMix (Applied Biosystems) on a TLDA card. All samples were assayed in triplicate. Two custom-designed reference genes, AL-1377271 and VPS-33B, and a commercial reference gene assay, 18S (Hs99999901_s1, Applied Biosystems), were also included in the panel. A mean of the Ct values for the three reference genes was calculated for each sample, and the expression levels of EMT target genes were determined using the 2−ΔCT method. dCt values were used to cluster cell lines according to EMT gene expression levels using the Cluster v.3.0 and Treeview v.1.60 softwares (Eisen Lab, University of California, Berkeley, CA, USA; [...] Associations between Endoglin methylation status and clinicopathological characteristics were analysed using the Chi-squared test, Fishers exact test or Kendall's tau-b statistic, where appropriate. Kaplan–Meier analysis was used to compare survival of methylated and unmethylated populations. Multivariate Cox regression analysis was carried out to estimate hazard ratios for the effect of Endoglin methylation together with potential confounding variables on overall survival. Differences in functional traits of cell lines were analysed by one or two-way analysis of variance, as appropriate. In all cases, P-values <0.05 were considered significant. Analyses were carried out using the IBM SPSS Statistics 21 software (IBM Corporation, Armonk, NY, USA) or GraphPad Prism 6 software (GraphPad Software, Inc., La Jolla, CA, USA). […]

Pipeline specifications

Software tools TreeViewX, SPSS
Applications Miscellaneous, Phylogenetics
Diseases Lung Neoplasms, Neoplasms