Computational protocol: Comprehensive genomic profiling of IgM multiple myeloma identifies IRF4 as a prognostic marker

[…] Genomic DNA (1 μg) from the bone marrow and matching blood samples was sheared by Covaris S220 (Covaris, MA, USA) and used for library construction with SureSelect XT Human All Exon v5 and SureSelect XT reagent kit, HSQ (Agilent Technologies, Santa Clara, CA, USA) according to manufacturer's protocols. After multiplexing, the libraries were sequenced on the HiSeq 2500 sequencing platform (Illumina, USA), using the 100 bp paired-end mode of the TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit (Illumina).Sequencing reads were aligned to the UCSC hg19 reference genome (downloaded from using Burrows-Wheeler Aligner (BWA) [], version0.6.2 with default settings. PCR duplications are marked by Picard-tools-1.8 (, data cleanup was followed by GATK, and variants were identified with GATK-2.2.9 ( Then, point mutations were identified by MuTect ( and VarScan 2 ( with paired samples. Perl script and ANNOVAR [] were used to annotate variants. [...] The library construction for whole transcriptome sequencing was performed using the TruSeq RNA sample preparation v2 kit (Illumina). Sequencing of the transcriptome library was carried out using the 100 bp paired-end mode of the TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit (Illumina).The reads from the FASTQ files were mapped against the GRCh37.75 human reference genome by using STAR ( version 2.4.0. The output files in BAM format were analyzed by RSEM ( version 1.2.18 to quantify the transcript abundance in transcripts per million (TPM). Coding genes were selected (20,652) and low-expression genes were filtered out by applying the criteria that the total TPM should be > 20.42 (mean TPM value) across all samples. Clustering was performed by Principal Component Analysis (PCA). We identified differentially expressed genes (DEGs) and performed gene ontology (GO) analysis using the ‘DESeq’ [] which is Bioconductor package ( in R and ‘DAVID’ []. We used two GEO datasets to evaluate the prognostic significance of IRF4 expression (GSE9782 [], GSE24080 []). […]

Pipeline specifications

Software tools STAR, RSEM, DESeq, DAVID
Application RNA-seq analysis
Organisms Homo sapiens
Diseases Multiple Myeloma