Computational protocol: Measuring error rates in genomic perturbation screens: gold standards for humanfunctional genomics

[…] We used Tophat v1.4.1 to align RNA-seq reads to the hg19 human transcriptome defined in the Gencode v14 GTF file, using default Tophat parameters. We used Cufflinks in quantitation-only mode with the same GTF file to generate FPKM values for each gene. FPKM values were filtered for protein-coding genes (as defined by HGNC, and log-transformed (adding 0.01 as a pseudocount). The mean log(FPKM) of technical or biological repeats was used, where applicable (e.g. biological repeats in ENCODE and technical repeats at 2 × 50 and 1 × 75 read type for BodyMap).For ENCODE (GEO accession GSE30567) and BodyMap (EBI accession E-MTAB-513), constitutive, invariant genes were defined as genes with mean expression in each data set > 0 and standard deviation < mean standard deviation across all protein-coding genes. Genes must be constitutive and invariant in both data sets. The reference set of putative nonessential genes is defined as protein-coding genes with FPKM < 0.1 in 15 of 16 BodyMap tissues and FPKM < 0.1 in 16 of 17 ENCODE cell lines. The set is filtered for genes that are assayed by the pooled shRNA library. [...] SNP analysis was performed at the University Health Network Microarray Center (Toronto, ON, CA) using Illumina (Illumina, San Diego, CA) HumanOmni1 BeadChip according to manufacturer's instructions. Normalized LogR ratio (LRR) and B allele frequency (BAF) signals for each probe were exported from the Illumina BeadStudio utility. Export files were then processed with the Genome Alteration Print (GAP) algorithm (Popova et al, ). Projections of LRR and BAF profiles were created, and pattern recognition was performed for each samples. Parameters were set as followed: germHomozyg.mBAF.thr > 0.97 and p_BAF = 0 (no normal contamination). Each pattern was visually inspected and corrected when the grid was off the segment center clusters. Output files produced by GAP were processed in order to obtain segments defined by copy number change only. Briefly, adjacent segments with identical absolute copy number were merged, and the LRR values were averaged. Gene level absolute copy number and LRR were obtained using the CNTools package. […]

Pipeline specifications

Software tools GAP, CNTools
Application aCGH data analysis
Organisms Homo sapiens
Diseases Neoplasms