Computational protocol: A survey on cellular RNA editing activity in response to Candida albicans infections

Similar protocols

Protocol publication

[…] For RNA-seq data processing and mapping, our pipeline was modified from previous work [–, ], from which small adjustment was made in order for it to work with our collected RNA-seq datasets. In brief, the Burrows-Wheeler algorithm (BWA) [] was used for RNA-seq reads mapping on reference genomes (the human hg19 reference genome and mouse mm9 reference genome, bwa aln -t 4, bwa samse –n4). The PCR duplicates reads were removed by MarkDuplicates tools from Picard (version: picard-1.127; Unmapped reads and those with mapping quality score lower than 20 were removed by Samtools (version 0.1.19, samtoosl view –bS –F 4 –q 20) []. [...] The protocol we used to call A-to-I RNA editing sites was derived from that described previously. In brief, the RNA-seq data mapped to reference genomes were subject to variant calling by the GATK analysis tool []. The called variant sites were filtered by rigorous parameters as described [, ]. Briefly, we required variants identified both in human and mouse to be supported by at least three mismatched reads on editing sites to reduce false positives. Both A-to-G and T-to-C mismatches were combined and counted as A-to-I editing sites. To remove possibly false positive RNA editing events due to SNPs, human SNP (Build 141 by NCBI) and mouse SNP (Build 128 by NCBI) were downloaded using the UCSC table browser data retrieval tool [] and were used to filter human and mouse RNA editing data, respectively. [...] We used CAVA [] (that provides additional clinical information, like disease association, for base variants in human genes) to annotate the A-to-I RNA editing sites from human cell lines, and ANNOVAR [] to annotate the RNA editing sites from mouse tongue and kidney tissues(CAVA is not developed to work for mouse genes). Sequence pattern around A-to-I RNA editing sites in human and mouse was delineated in two steps: 1) extracting the profile of up- and down-stream sequences (15 bases on each side) flanking editing sites using bedtools getfasta []; 2) visualizing the sequence context around RNA editing sites using WebLogo 3(weblogo –A dna –c classic –units probability –first-index −10) []. [...] Level of expressed genes in RPKM (Reads Per Kilobase per Million mapped reads) was estimated from RNA-seq mapping results as described []. Briefly, HISAT2 [] was used to map reads on reference genomes, HTSeq [] was used to count mapped reads for expressed genes, and edgeR [] was used to perform gene differential expression analysis. Differentially expressed genes were defined as fold-change greater than 2 and false discovery rate (FDR) smaller 0.05. All RNA-seq reads were first trimmed by Trimmomatic-0.32 [] with parameters: HEADCROP = 10, SLIDINGWINDOW = 4:20 and MINLEN = 36). In addition, duplicated reads were removed by Picard. [...] R package Venn Diagram [] was used to calculate and draw the overlapping between our identified RNA editing sites and those included in the databases: DARNED, RADAR and REDIportal. R package ggplot2 [] was used for plotting other figures. For statistics testing with distribution of A-to-I RNA editing data, the nonparametric test, Kruskal-Wallis rank sum test, was performed. For correlation analysis between ADAR gene expression and normalized RNA editing levels, the Spearman’s rank correlation coefficient was computed with R. […]

Pipeline specifications

Software tools BWA, Picard, SAMtools, GATK, ANNOVAR, BEDTools, WebLogo, HISAT2, HTSeq, edgeR, Trimmomatic, Ggplot2
Databases REDIportal
Applications Miscellaneous, RNA-seq analysis, Genome data visualization
Organisms Candida albicans, Homo sapiens, Mus musculus
Diseases Candidiasis, Infection, Virus Diseases, HIV Infections
Chemicals Adenosine