Computational protocol: Integrative analysis of genomic alterations in triple-negative breast cancer in association with homologous recombination deficiency

Similar protocols

Protocol publication

[…] Paired-end reads of WGS were aligned to the human reference genome (hg19) using the Burrows-Wheeler Aligner (BWA, http://bio-bwa.sourceforge.net/) []. SNVs, indels, and SVs were called using our in-house program as described previously [,] with some modification.To predict somatic SNVs and indels, the filters described previously were applied. SNVs and indels were selected when the frequency of the non-reference allele was at least 5% in the tumor genome. In our somatic mutation call, we first compared variants in a matched pair (tumor/normal sample for each individual patient) and removed personal germline variants. Next, we made a comparison with all normal samples grouped together, a so-called “normal panel”, and removed false positive variants that occurred by sequence errors. This strategy is very effective for removing false positives because sequence errors occur in a sequence-specific manner at a certain frequency rather than randomly.Fifty base-pair paired-end reads were used for rearrangement analysis, because they contain longer spacers than 125 bp paired-end reads. Therefore, 125 bp paired-end reads were separated to generate 50 bp paired-end reads. To detect structural variations, we used a paired-end read for which both ends aligned uniquely to the human reference genome, but with improper spacing, orientation, or both.First, paired-end reads were selected based on the following filtering conditions: (i) sequence read with a mapping quality score greater than 37; (ii) sequence read aligned with two mismatches or less. Rearrangements were then identified using the following analytical conditions: (i) forward and reverse clusters, which included paired-end reads, were constructed from the end sequences aligned with forward and reverse directions, respectively; (ii) two reads were allocated to the same cluster if their end positions were not farther apart than 400 bp; (iii) paired-end reads were selected if one end sequence fell within the forward cluster and the other fell within the reverse cluster (we hereafter refer to this pair of forward and reverse clusters as paired-clusters); (iv) for the tumor genome, rearrangements predicted from paired-clusters, which included at least six pairs of end reads, were selected; (v) rearrangements detected in the tumor genome, but not present in the panel of non-tumor genome (all non-tumor genomes grouped together), were selected as somatically acquired rearrangements. [...] Paired-end WES reads were aligned to the human reference genome (hg19) using BWA [], Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) [], and NovoAlign (http://www.novocraft.com/products/novoalign/) independently. Somatic mutations were called using MuTect (http://www.broadinstitute.org/cancer/cga/mutect) [], SomaticIndelDetector (http://www.broadinstitute.org/cancer/cga/node/87) [], and VarScan (http://varscan.sourceforge.net) []. Mutations were discarded if (1) the read depth was <20 or the variant allele frequency (VAF) was <0.1, (2) they were supported by only one strand of the genome, or (3) they were present in the “1000 genomes” database (http://www.1000genomes.org) or in normal human genomes from our in-house database. Gene mutations were annotated by SnpEff (http://snpeff.sourceforge.net) []. CN status was analyzed by our in-house pipeline that calculates the log R ratio using normal and tumor VAFs based on dbSNPs of the 1000 genomes database. [...] For expression profiling with RNA-seq data, paired-end reads were aligned to the hg19 human genome assembly using TopHat2 (https://ccb.jhu.edu/software/tophat/index.shtml) []. The expression level of each RefSeq gene was calculated from mapped read counts using Cufflinks (http://cufflinks.cbcb.umd.edu) []. […]

Pipeline specifications

Software tools BWA, Bowtie2, NovoAlign, MuTect, SomaticIndelDetector, VarScan, SnpEff
Applications WGS analysis, WES analysis
Organisms Mus musculus, Homo sapiens
Diseases Breast Neoplasms, Neoplasms
Chemicals Estrogens, Poly Adenosine Diphosphate Ribose, Progesterone