Computational protocol: Combined genomic and phenotype screening reveals secretory factor SPINK1 as an invasion and survival factor associated with patient prognosis in breast cancer

[…] In all cases, raw data (CEL files) were pre-processed and normalized using the R software package (R development Core Team, 2010) and library files provided via the Bioconductor project (Gentleman et al, ). In order to preserve a consistent normalization strategy across all cohorts, raw data were MAS5.0 normalized on a per-cohort basis using the justMAS function in the simpleaffy library from Bioconductor (no background correction, target intensity of 600). The specific array platforms employed here were the HG-U133A, HG-U133plus2 and HG-U113A2 gene chips. To ensure equal information content from each chip type, only probe sets common to all chip types were utilized in subsequent analysis. This resulted in the use of 22,268 probe sets that were common to all microarrays in all cohorts. Cross-cohort batch effects were corrected using the COMBAT empirical Bayes method (Johnson et al, ). Of the initial 2116 tumour profiles, 2034 represent primary invasive breast cancers with no exposure to neoadjuvant therapy prior to array analysis. Of these, 1954 cases are annotated with DMFS time and event. Of note, other clinical annotation such as treatment type, estrogen receptor status, nodal status, tumour size, histologic grade and patient age are available for the majority of cases. […]

Pipeline specifications

Software tools Simpleaffy, ComBat
Application Gene expression microarray analysis
Organisms Homo sapiens
Diseases Breast Neoplasms, Neoplasms
Chemicals Estrogens