Computational protocol: RNA-Seq Analysis of the Expression of Genes Encoding Cell Wall Degrading Enzymes during Infection of Lupin (Lupinus angustifolius) by Phytophthora parasitica

Similar protocols

Protocol publication

[…] Total RNA was extracted with a Qiagen RNeasy Midi Kit using shredder columns from a Qiagen Plant RNeasy Kit following the manufacturer’s instructions. gDNA was removed by on-column digestion with DNase (Qiagen) at twice the concentration recommended by the manufacturer. The concentration of total RNA was determined by spectrometry and confirmed by running approximately 1 μg on an RNase-free 1.5% agarose gel in TAE (40 mM Tris, 1 mM EDTA, 20 mM acetic acid) and staining with Red Safe nucleic acid stain (iNtRON Biotechnology, Kyunggi-do, Korea). Three of the four biological replicates for each time point were selected for transcriptome sequencing after considering their ratio of P. parasitica:lupin gDNA (hereafter referred to as the pathogen load), and the quality and concentration of their total RNA.RNA library preparation and sequencing were conducted at the Australian Genome Research Facility (Parkville, Victoria). The three biological replicates for each time point were multiplexed and run in a single Illumina HiSeq 2000 lane, generating 50-bp single-end reads using the Illumina Consensus Assessment of Sequence and Variation (CASAVA) pipeline (v1.8.2). Sequence quality was assessed using FastQC (version 0.10.1, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Quality scores for all samples were >30 () and no trimming was required. The RNA-Seq data have been deposited in the NCBI Sequence (Short) Read Archive with the SRA accession number SRP061812. [...] Reads from all samples, including the mock-inoculated roots, were aligned to version 2 of the P. parasitica genome (phytophthora_parasitica_inra-310_2_supercontigs.fasta) using BioKanga (http://code.google.com/p/biokanga/; v 1.12.1) with default parameters, which includes exhaustive search and allowance for five mismatches across the 50-bp reads. BioKanga has been deliberately developed for non-model genome analysis and incorporates extensive optimisation to deal with increased uncertainty when aligning to non-reference genomes. Reads that mapped to only a single location were placed in a “unique location” data set and used to analyze expression patterns of individual genes. In a “multiple location” data set, reads that mapped to more than one location were retained after random assignment to one position. In both cases, reads that aligned to regions delimitated by the coordinates of the P. parasitica predicted transcripts, which exclude 5’ and 3’ untranslated regions and introns, gave rise to the raw count values. Some reads mapped to transcripts that had splice variants (designated T0, T1 and T2 in the P. parasitica genome project data). In these cases, the T0 transcript variant was used.The DESeq approach [] and associated R program (http://bioconductor.org/packages/release/bioc/html/DESeq.html) was used to normalise the mapped reads across the unique or multiple location data sets, estimate dispersion or variance for each gene using the replicates within the experimental design and determine statistical support for differential expression. Normalised reads per kilobase of predicted transcript (NRPK) were then calculated. The effectiveness of the normalisation algorithm in adjusting for pathogen load was assessed by examining the NRPK values for five P. parasitica genes shown by qPCR to be constitutively expressed under a range of conditions []. These genes were peptidyl prolyl isomerase 2 (PPTG_02092), phospholipase A2 (PPTG_08636), 40S ribosomal protein S3A (WS021, PPTG_07764), ubiquitin-conjugating enzyme (PPTG_08273), and WS041.Analysis of the NRPK RNA-Seq data was based on the median value of the three biological replicates for each time point. In consideration of the digital nature of RNASeq, a measure of the overall level of transcription of each CWDE gene was obtained by summing the median NRPK values for each of the 30–60 hpi time points. Genes were deemed to be expressed if the median of the raw counts at any time point was ≥20 and the total of the median NRPK values across the time-course was ≥5. In three cases, the annotation of the P. parasitica genome incorrectly split genes into two transcripts []. For these transcripts, the NRPKs of the two transcripts have been added together.For each transcript, differential expression between time points was assessed by transforming the normalised unique location data set to log base 2 (log2) and calculating the difference between the highest and lowest median, non-zero log2 values. Because only non-zero values were included, this provided a conservative measure of fold changes. It is generally accepted that genes are differentially expressed if their log2 value is ≥ 2 [,].The pattern of expression of CWDE families was determined using NRPK values from the multiple location data set. Gene expression profiles were categorised as early (≤36 hpi), middle (42 and 48 hpi) or late (54 and 60 hpi) according to the time at which transcript levels peaked. At each time point, the NRPKs of transcripts for proteins targeting a predicted substrate were summed to give an indication of the total transcript level for each CWDE family. Genes encoding CAZyme proteins that are not involved in the degradation of cell wall polysaccharides or glycoproteins were not included in the current analysis although the raw data for these genes can be found in . They included amylase, invertase and trehalase. Genes with a total NRPK value across the time-course of <50 were not included in the detailed analyses of CAZyme expression patterns. […]

Pipeline specifications

Software tools BaseSpace, FastQC, BioKanga, DESeq
Application RNA-seq analysis
Organisms Phytophthora parasitica
Diseases Infection
Chemicals Carbohydrates