Computational protocol: A Comprehensive Characterization of Genome-Wide Copy Number Aberrations in Colorectal Cancer Reveals Novel Oncogenes and Patterns of Alterations

[…] Copy number data was analyzed with the Nexus Copy Number 6.0 software (Biodiscovery, Inc., CA, USA). The raw copy number data for each probe provided by Affymetrix was smoothed by a quadratic correction provided by NEXUS and centered using diploid regions. CNA frequency comparisons amongst sample groups (e.g. MSS versus MSI; stage-II versus stage-III) was performed using NEXUS default thresholds of >15% difference and significance p<0.01 (Fisher’s exact test). To generate copy number segments and minimal common regions (MCRs), we applied a modified version of the Circular Binary Segmentation (CBS) algorithm called “Rank Segmentation” in NEXUS. The p-value cutoff for CBS was 1.0E–6, and segments were assigned to 1 of 5 bins: amplified (>3.8 copies), gained (2.3 to 3.8 copies), unchanged (1.7 to 2.3 copies), deleted (0.5 to 1.7 copies) or homozygously deleted (<0.5 copies). For MCR frequency significance testing, we used a p-value cutoff of <0.01 from the statistical Significance Testing for Aberrant Copy number (STAC) method . Hierarchical clustering of CNA was performed in NEXUS too (complete linkage, sex chromosomes ignored). To detect focal amplifications, we applied GISTIC (Genomic Identification of Significant Targets in Cancer) version 2.0 using a Q-value cutoff <0.25. Genes reported in GISTIC2 amplification peaks were further examined if they are enriched in any biological pathways. We used canonical pathway database provided by MSigDB . Pathway gene sets with less than 10 members or greater than 500 members were excluded. Fisher’s exact test was used to access if those genes are over-represented. FDR was calculated based on 100 permutations where random sets of genes of same size were tested. We also used Fisher’s exact test to see if frequencies of certain CNAs differ among patient groups (stage II vs. III, MSI vs. MSS etc). Survival analysis was performed using the Kaplan–Meier method with a p value (log-rank test) cutoff of <0.01. For analysis of CNA/CNA correlations, the Pearson correlation was computed at the gene level for all pairs of genes as described previously . To derive gene level summaries from the copy number data, we assigned the copy number values from the segment(s) overlapping each gene: when there were multiple segments within the gene boundary, we averaged the copy numbers from those segments. All genome-based data reported in this manuscript are based on NCBI build 36 (hg18) of the human genome. […]

Pipeline specifications

Software tools DNA copy, GISTIC
Application aCGH data analysis
Organisms Homo sapiens
Diseases Neoplasms, Colorectal Neoplasms