Computational protocol: Drought stress tolerance strategies revealed by RNA-Seq in two sorghum genotypes with contrasting WUE

Similar protocols

Protocol publication

[…] The RNA-seq reads generated by the Illumina Genome Analyzer were initially processed to remove the adapter sequences, reads in which unknown bases are more than 10 % and low-quality reads. After filtering, the remaining reads, so called “clean reads”, were used for downstream bioinformatics analysis. In the pipeline, clean reads are aligned to the reference sequence (ftp://ftp.ensemblgenomes.org/pub/plants/release-20/fasta/sorghum_bicolor/dna/) by using SOAPaligner/SOAP2. No more than 5 mismatches are allowed in the alignment. A quality control step was performed after that step and the distribution of reads on reference genes was analysed. Gene coverage was calculated as the percentage of a gene covered by reads. This value is equal to the ratio of the base number in a gene covered by unique mapping reads to the total base number of coding region in that gene. The expression level was, on the other hand, calculated using RPKM (Reads per Kilobase transcriptome per Million mapped reads) method [], according to the following formula:RPKM=106CNL/103where C is the uniquely mapped counts determined from the high quality category, L is the cDNA length for the longest splice variant for a particular gene and N is the number of total mappable reads which was determined as the sum of the high quality reads and the highly repetitive reads. This method is able to eliminate the influence of different gene length and sequencing discrepancy on the calculation of gene expression. Log2 transformations of this normalization were performed. [...] A strict algorithm was developed to identify differentially expressed genes between two samples and false positive and false negative errors are performed using Benjamini and Yekutieli [] FDR method. We used FDR ≤0.001and the absolute value of Log2Ratio ≥2 as the threshold to judge the significance of gene expression difference. Gene Ontology (GO) enrichment was based on AgriGO software [] with hypergeometric statistical test and Hocberg (FDR).Pathway enrichment analysis of DEGs was performed using the Kyoto Encyclopedia of Genes and Genome (KEGG, http://www.genome.jp/kegg/). This analysis allows to identify enriched metabolic pathways or signal transduction pathways in DEGs comparing with the whole genome background. A strict algorithm was used for the analysis:P=1−∑i=0m−1MiN−Mn−iNmWhere N is the number of all genes with KEGG annotation; n is the number of DEGs in N, M is the number of all genes annotated to specific pathway. Pathways with Qvalue ≤0.05 are significantly enriched in DEGs. [...] The assembled transcripts were compared with the annotated genomic transcripts from the reference sequences in order to discover novel transcribed regions. Three requirements are needed: the transcript must be at least 200 bp away from annotated gene, the length of the transcript must be over 180 bp, the sequencing depth must be no less than 2. The Coding Potential Calculator (CPC: http://cpc.cbi.pku.edu.cn/ ) was used to assess the protein-coding potential. TopHat software [] was used to detect alternative splicing events (ASE). […]

Pipeline specifications

Software tools SOAPaligner, agriGO, CPC, TopHat
Databases KEGG
Applications RNA-seq analysis, Transcription analysis
Organisms Sorghum bicolor
Chemicals Carbon, Glutathione