Computational protocol: RNA-seq based identification and mutant validation of gene targets related to ethanol resistance in cyanobacterial Synechocystis sp. PCC 6803

Protocol publication

[…] Sequence reads were pre-processed using FASTX Toolkit (Version: 0.0.13) to remove low-quality bases, and reads shorter than 20 bp. The qualified sequence reads were then mapped to non-coding RNA (ncRNA) sequences using Bowtie (Version: 2.0.0) with default settings. Genome sequences (including ncRNA sequences) and annotation information of Synechocystis sp. PCC 6803 were downloaded from NCBI and the Comprehensive Microbial Resource (CMR) of TIGR ( (Downloaded on April 22, 2012) []. Reads that mapped to ncRNA sequences were excluded from further analysis. For paired-end Illumina reads, both pairs were removed if either pair mapped to rRNA. Remaining reads were mapped to the Synechocystis sp. PCC 6803 genome using Bowtie (Version: 2.0.0) with the default parameters. For gene expression determination, we performed a standard calculation of Reads Per Kilobase of Gene Per Million Mapped Reads (RPKM) based on the following formula []: RPKM = transcription _ reads transcription _ length X total _ assembly _ reads _ in _ run × 10 9 in which “transcription_reads” stands for the number of reads mapped to a given gene; transcription_length stands for gene length; and “total_mapped_reads_in_run” stands for the total number of reads in a given measurement. For each time point, two biological replicates of ethanol-treated samples and their control were analyzed and the corresponding gene expression ratios based on RPKM were calculated, the genes with 1.5 fold changes in both biological replicates were determined as differentially regulated genes. [...] The Gibbs Motif Sampler software from the Biometrics Laboratory of Wadsworth Center, (, was used to identify matrix models describing DNA sequence motifs present upstream of genes responsive to ethanol treatment [,]. Regions representing approximately 500 base pairs of the DNA sequences upstream of the translational start site of genes responsive to ethanol stress were extracted from the NCBI genome database using the Regulatory Sequence Analysis Tools (RSAT) []. Both strands of each sequence were searched and possible motif locations were identified using the motif matrix score obtained from the Gibbs Motif Sampler software. The multilevel consensus sequence for each motif was then used to generate a sequence logo that is a graphical representation of nucleic acid multiple sequence alignment ( []. […]

Pipeline specifications

Software tools The Gibbs Motif Sampler, RSAT, WebLogo
Application Genome data visualization
Organisms Synechocystis sp. PCC 6803
Chemicals Ethanol