Computational protocol: The Shc1 adaptor simultaneously balances Stat1 and Stat3 activity to promote breast cancer immune suppression

Similar protocols

Protocol publication

[…] Sequencing reads were trimmed using Trimmomatic v0.32 (ref. ), removing low-quality bases at the ends of reads (phred33<30) and clipping the first four bases in addition to Illumina adaptor sequences using palindrome mode. A sliding window quality trimming was performed, cutting once the average quality of a window of four bases fell below 30. Reads shorter than 30 bp after trimming were discarded. The resulting high-quality RNAseq reads were aligned to the mouse reference genome build mm10 using STAR v2.3.0e. Uniquely mapped reads were quantified using featureCounts v1.4.4 and the UCSC gene annotation set. Integrative Genomics Viewer was used for visualization. Multiple quality control metrics were obtained using FASTQC v0.11.2, SAMtools, BEDtools and custom scripts.RNAseq gene expression analysis. Global expression changes were assessed by unsupervised hierarchical clustering of samples and principal component analysis (PCA). To this end, expression levels were estimated using exonic reads mapping uniquely within the maximal genomic locus of each gene and its known isoforms. Normalization (median of ratios) and variance stabilized transformations of the data were performed using DESeq2 (ref. ). Pearson's correlation was used as the distance metric for hierarchical clustering and average linkage as the agglomeration method. Bootstrapped hierarchical clustering was computed using the R package pvclust. Differential expression analysis to identify expression changes with respect to wild-type (WT) ShcA controls was performed using DESeq2 (ref. ). Genes with statistically significant (adjusted P-value<0.05) and large (fold change>2) expression changes, expressed above a threshold (average normalized expression across samples >100) were selected to derive gene signatures associated with each genotype. Human leukocyte antigen genes, genes with no known function and genes with no human orthologues were removed from downstream analyses. To acquire the ShcA-regulated gene signatures, we first compared genes that are differentially expressed between the following groups: (1) ShcA-WT versus Shc2F and (2) ShcA-WT versus Shc313F. We then compared both lists of differentially expressed genes to identify: (a) genes that are commonly differentially expressed in all Shc2F cell lines relative to the rest (Shc2F-like), (b) genes that are commonly differentially expressed in all Shc313F cell lines relative to the rest (Shc313F-like) and (c) genes that are commonly differentially expressed in both Shc2F and Shc313F cells relative to ShcA-WT cells.STAT1 and STAT3 gene signatures, on the other hand, were derived from previously reported validated targets (). In addition, we required that mRNA levels of these across patient samples displayed a Spearman's correlation R>0.1 with STAT1 and STAT3 mRNA levels, respectively.All gene signatures were projected across 1,215 human breast cancers from TCGA data set using ssGSEA as described before. Briefly, a score is defined to represent the degree of enrichment of a given gene set in a sample: gene expression values for each sample are rank-normalized and an enrichment score is produced using the empirical cumulative distribution functions (ECDF) of genes, with the final score computed by integrating the difference between a weighted ECDF of genes in the signature and the ECDF of the remaining genes. This calculation is repeated for each signature and each sample in the data set. To compute ssGSEA scores, we used the GenePattern software implementation from the Broad Institute, ssGSEAProjection (v6). We first verified that the ssGSEA scores for reduced gene signatures (containing only genes that have human orthologues) are highly correlated with the ShcA genotype in mice (). Spearman's correlations between each signature and expression values of specific genes (GZMB, CD8A and PD-L1) were then computed. For visualization purposes, patients were ranked-ordered and stratified in quartiles, and the mean expression value for each gene and each quartile was computed. […]

Pipeline specifications