Computational protocol: Tsetse fly (Glossina pallidipes) midgut responses to Trypanosoma brucei challenge

Similar protocols

Protocol publication

[…] Low-quality reads, reads with less than 100 base pairs and adapter sequences were removed by Illumina build software (Illumina, Hayward, CA, USA) in sequence clean up. The resultant raw RNA-Seq reads from each treatment were stored in bam file formats of interleaved FastQ formatted sequences for downstream analysis. Sequence quality in each file was assessed using the FastQC software (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), and data were filtered for quality using SamToFastq software (http://broadinstitute.github.io/picard/). All filtered reads were aligned to the protein-coding genes of G. pallidipes at Vectorbase []. The differential expression (DE; differences in expression of transcripts between RNA-Seq libraries) profiles of the transcripts were determined using the RNA-Seq analysis module in the CLC genomic workbench version 8.0 (CLC Bio, Aarhus, Denmark) as described []. The profiles were normalized using Kal’s test [] and compared between challenged and control midguts or carcasses (24 or 48 hpc), respectively. To minimize false positives, transcripts were considered DE between treatments if they had the following criteria: at least a two-fold change, false discovery rate (FDR) corrected P < 0.05, at least five reads per kilobase of transcripts per million mapped reads (RPKM), a proxy of gene expression [] and supported by at least 100 unique read mappings. Most abundant transcripts were considered as those within the 90 percentile in this selection and supported by at least 5000 reads. The fold changes were determined as a ratio of RPKM values between treatments and respective controls and normalized based on the number of reads from each library. Enrichment analysis was conducted to determine enrichment of transcripts within and between two midgut temporal samples and respective carcasses (spatial).We validated the differentially expressed (DE) profiles of ten randomly selected genes by real-time quantitative PCR (RT-qPCR) analysis from midguts obtained at 48 hpc and controls, respectively. These analyses were conducted using independent biological replicates obtained from dissected midgut and carcass tissues generated under the same experimental conditions as described for the transcriptome samples. Total RNA (1 μg) was reverse transcribed using iScript™ cDNA synthesis kit (BIO-RAD, Hercules, USA), according to manufacturer’s protocol. Transcript expressions were evaluated by RT-qPCR using the gene-specific primers and amplification conditions described in (Additional file : Table S1). The expression levels were analyzed with CFX Manager Software version 3.1 (Bio-Rad) and normalized to the G. pallidipes housekeeping gene glyceraldehyde 3-phosphate dehydrogenase (gapdh) (VectorBase accession number GPAI033271). Fold change in transcript expressions were established by comparing levels of expression in challenged (treatment) relative to unchallenged (control) midguts. Pearson correlation analysis was conducted between fold changes obtained from RT-qPCR to those obtained from the RNA-seq data to estimate our false positive rate. [...] To identify functions and processes that may be altered by DE putative products, gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Wikipathways pathway enrichment analyses were conducted using the web-based gene set analysis toolkit (WebGestalt; Vanderbilt University, TN, USA; http://www.webgestalt.org/ []. Drosophila melanogaster genes were used as a proxy for G. pallidipes where D. melanogaster homologs of the G. pallidipes differentially induced or suppressed genes were employed. Hypergeometric test, Benjamini & Hochberg multiple test adjustment [] and P < 0.05 cut-off values were employed to separate and identify significant functions and pathways. Additional functional annotations of DE gene sets were performed using BLASTx [] to compare nucleotide sequence to the non-redundant protein database at National Centre for Biotechnology Information (NCBI), GO and Interpro databases using Blast2GO™ software [, ]. An e-value of 0.001 was used to perform the BLAST and annotation steps while mapping was carried out by default settings. Drosophila melanogaster transcripts encoding putative immune-specific and associated proteins were acquired from FlyBase [] as previously described [, ] and were used to identify their potential homologs among the DE transcripts by tBLASTx [] homology searches. Heatmaps of gut DE transcripts at 24 and 48 hpc were developed by comparing fold changes of respective RPKM values using Complex Heatmaps Bioconductor R package [] by employing “maximum” and “ward.D” methods within the package. Orthology groups containing G. pallidipes specific gene expansions, as determined by the Ensembl compara pipeline [], were retrieved from Vectorbase []. These genes were functionally annotated as previously described using BLASTx and Blast2GO™ software. The DE profile of the orthologs was analyzed between the 24 and 48 hpc datasets using Complex Heatmaps Bioconductor R package [] where only orthologs supported by at least 100 reads and more than 1 RPKM were considered. […]

Pipeline specifications

Software tools FastQC, Picard, CFX Manager, WikiPathways, WebGestalt, BLASTX, Blast2GO, TBLASTX, Complex heatmaps
Databases FlyBase VectorBase KEGG
Applications RNA-seq analysis, Nucleotide sequence alignment, Transcriptome data visualization
Organisms Drosophila melanogaster, Trypanosoma brucei, Toxoplasma gondii, Homo sapiens
Diseases Infection