Computational protocol: High-Throughput Chemical Screening for Antivirulence Developmental Phenotypes in Trypanosoma brucei

Similar protocols

Protocol publication

[…] RNA samples for analysis by high-throughput sequencing of RNA transcripts (RNA-Seq) were harvested from CAT-PAD1 3′ UTR GUS-Const 3′ UTR cells after treatment with 50 μM DDD00015314 or 0.5% (vol/vol) DMSO, in duplicate, for 24 h. Library construction and sequencing were carried out by BGI-Hong Kong using an Illumina TruSeq RNA Sample Preparation Kit. The quality of raw sequence data was assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Paired-end sequences were aligned to the Trypanosoma brucei brucei genome (obtained from ftp.sanger.ac.uk/pub4/pathogens/Trypanosoma/brucei/Latest_Whole_Genome_Sequence/Tb927_WGS_24_08_2012/chromosomes/) using Bowtie2 (parameters, very-sensitive-local; version 2.0.2); the outputs were filtered to remove alignments flagged with the XS Sequence Alignment/Map (SAM) flag, yielding an alignment data set that included only reads that map to a single location in the reference genome. These data were subsequently sorted and indexed using samtools. The annotated Trypanosoma brucei brucei genome was viewed using Artemis software (http://www.sanger.ac.uk/resources/software/artemis/ and http://ukpmc.ac.uk/abstract/MED/11120685), and coding segment (CDS) region coordinates were extracted prior to conversion to bedtools bedfile format. Bedtools (version 2.15.0; parameters, multicov, -bams) was used to generate coverages for each CDS for each sample replicate. Reads per kilobase per million reads mapped (RPKM) values were calculated by dividing coverage by CDS size using only reads that mapped to a single location in the reference genome (RPKM-like). Statistical analyses of the two sample groups were undertaken in the R environment (www.R-project.org) using Bioconductor (www.bioconductor.org) packages. Differential expression was explored using linear models and empirical Bayes methods, using the limma package (). RPKM-like values were offset by 1, logged, and quantile normalized prior to group-wise comparison. Postcomparison, data were filtered to remove loci whose normalized mean RPKM-like values were below the 20% quantile for all samples.Meta-analysis of the log2 expression data from Capewell et al. () and Jensen et al. () required identification of overlapping gene sets, achievable using systematic gene names. However, as there have been many improvements in both the sequence and annotation of the Trypanosoma brucei brucei 927 genome, some loci had been renamed, with the previous names retained in their annotation. Using simple parsing scripts, a table of current and previous gene names was generated, thereby facilitating data set comparisons. […]

Pipeline specifications

Software tools FastQC, Bowtie2, SAMtools, BEDTools, limma
Application RNA-seq analysis
Organisms Trypanosoma brucei, Homo sapiens