Computational protocol: SYK expression level distinguishes control from BRCA1-mutated lymphocytes

Protocol publication

[…] Poly(A)-selected RNA was sequenced using the Illumina TruSeq protocol on the HiSeq 2500 sequencing machine. Quality control checks on the raw sequence data were carried out using the FastQC tool ( Then, the Trim Galore! tool (, which is based on cutadapt, was used for adapter trimming and for removing low-quality bases from the ends of reads. Clean reads were mapped to the human genome (hg38) using tophat2. Next, the number of reads mapping each human gene (as annotated in Ensembl release 77) was counted using the “union” mode of HTseq-count script. Differential expression analysis was performed using the edgeR and Limma packages from the Bioconductor framework. Briefly, features with <1 read per million in at least five samples were removed. The remaining gene counts were normalized using the trimmed mean of M values method, followed by voom transformation., Linear models, as implemented by the Limma package, were used to find differentially expressed genes. Since FDR application was too stringent for this data set, genes with a P-value <0.001 were considered as differentially expressed. Gene set enrichment and pathway analysis were done using GeneAnalytics. [...] The expression values of BRCA1, BRCA2, and SYK genes across the breast tissues taken from healthy donors and patients with breast cancer (available from the GTEx and ICGC databases, respectively) were visualized using the UCSC Xena genome browser. In each database, samples were divided into two groups based on the expression of BRCA1 or BRCA2 genes using the median value as the cutoff point. The chart view option in Xena was used to view the distribution of SYK expression across the two groups and to calculate the Welch’s t-test. Mean centered values of the normalized SYK expression were downloaded and box plots were generated using R language. […]

Pipeline specifications

Software tools FastQC, Trim Galore!, cutadapt, TopHat, HTSeq, edgeR, limma, voom, GeneAnalytics, UCSC Xena
Databases GTEx
Applications WES analysis, Genome data visualization
Organisms Homo sapiens
Diseases Breast Neoplasms, Ovarian Neoplasms, Hereditary Breast and Ovarian Cancer Syndrome
Chemicals Tyrosine