Computational protocol: The effect of androgen receptor expression on clinical characterization of metastatic breast cancer

[…] After trimming the poor-quality bases from FASTQ files for whole transcriptome sequencing, the reads were aligned to the human reference genome hg19 with Tophat (version 2.0.6) and reference-guided assembly of transcripts with Cufflinks (version 2.1.1) was performed. The alignment quality was verified with SAMtools (version 0.1.19). Transcript abundance was estimated using a count-based method with htseq-count. Gene counts were used as input for TMM (Trimmed Mean of M values) normalization of the R package edgeR [], and normalized counts were transformed to log2-counts per million (logCPM) by applying voom from the R package limma [] to account for higher variability at low expression levels. Genes with zero read counts across all samples were removed for a more powerful statistical test (). [...] Poor quality reads were filtered out and aligned to the human reference genome (hg19) using Burrows-Wheeler Alignment tool (BWA, version 0.7.5a). In order to convert Sequence Alignment and Mapping (SAM) files into Binary Alignment and Mapping files (BAM) we used SAMtools (version 0.1.19). Polymerase chain reaction (PCR) duplicates were removed from the BAM files by Picard (version 1.93, and SAMtools before variant calling. The Genome Analysis Toolkit (GATK, version 2.4.7) was used to recalibrate base quality and optimizing local realignment. Single nucleotide variants (SNVs) and indels were called using muTect (version 1.1.4) and Varscan2 (version 2.3.5) by default parameter settings. Copy number variations were detected using CONTRA (version 2.0.4). Variants were annotated using ANNOVAR, with gene, chromosomal information, exonic function function (synonymous, nonsynonymous, stop gain, nonframeshift or frameshift indel), amino acid change, allele frequency in frequency in public databases such as 1000 Genomes Project (2012 February version) and dbSNP version (version 132, 137).Variants that were located in the exonic regions with sufficient coverage (minimum depth of coverage ≥8) and variant allele frequency (VAF ≥0.1) were chosen for further statistical analyses. Synonymous variants were filtered out. Read alignments were manually investigated using the Integrative Genomic Viewer ('s exact test was used for the analysis of mutations and polymorphic variants separately, to discover variants that were enriched in the patients with a favorable outcome. P-values <0.05 were considered significantly different. All statistical analyses, plots and heatmaps were conducted using R version 3.0.2 ( […]

Pipeline specifications

Software tools BWA, SAMtools, Picard, GATK, MuTect, VarScan, ANNOVAR, IGV
Databases dbSNP
Application WES analysis
Organisms Homo sapiens
Diseases Breast Neoplasms
Chemicals Estrogens, Progesterone, Sirolimus