Computational protocol: Characterization of the abomasal transcriptome for mechanisms of resistance to gastrointestinal nematodes in cattle

Similar protocols

Protocol publication

[…] 23 632 bovine genes in the Bovine Official Gene Set version 2 (OGS2.0) [] were first mapped to the bovine reference genome (Btau4.0) using Genomic Short-read Nucleotide Alignment Program or GSNAP []. The best mapping position of each gene (≤ 10 kb intron span, ≥ 95% identity, ≥ 90% coverage, and minimum tail length of 5% of coding sequences) was extracted. Accordingly, 20 809 of the 23 632 bovine genes were uniquely mapped. After removing ambiguously mapped genes, 18 834 genes were used for RNA-seq data analysis.Raw sequence reads were then checked using several layers of quality control filtering to remove low-quality reads. Raw reads with ≥ 2 ambiguous nucleotides (N) were discarded. Trimming removed approximately 1% of input raw reads and led to the retention of 99% of raw reads. The input reads after cleansing were mapped to the reference genome with gene coordinates using Bowtie (v0.12.7), an ultrafast and memory-efficient short-read aligner using a Burrows-Wheeler index []. Approximately 68.5% of trimmed reads mapped to the bovine genome (mean ± SD = 68.54% ± 2.48%). Only reads with one unique best match in the reference genome were used for subsequent analyses. The read depth of each gene was computed based on the coordinates of mapped reads and gene locations in the reference genome and was normalized using a method that corrects for biases introduced by RNA composition and differences in the total numbers of uniquely mapped reads in each sample []. Only genes having ≥ 20 uniquely mapped reads (mean of all 6 samples) were further analyzed. The R package edgeR was used to test the null hypothesis that expression of a given gene is not different between the two groups []. The normalized read counts were also analyzed using the DEGseq algorithm []. The DEGseq built-in function "samWrapper" that is recommended for testing RNA-seq data with biological replication was used to detect differential expression. Candidate genes were first sorted based on P value (P < 0.05) and fold change (2-fold as a cutoff). Genes identified as candidates for differential expression were further filtered with a false discovery rate (FDR) of < 5% to account for multiple testing. Differentially-expressed genes in the transcriptome were further analyzed using Gene Ontology (GO) analysis (GOseq). Over-representation of certain GO terms was determined based on Fisher's exact test. A multiple correction control (permutation to control false discovery rate []) was implemented to set up the threshold to obtain the lists of significantly over-represented GO terms. The candidate genes were analyzed using IPA v9.0 for pathways (Ingenuity Systems, Redwood City, CA, USA).Tophat (v1.2.0) was used to map input reads to the reference genome [] and identify potential splice junctions or splicing variants. Only reads with one unique match in the reference genome were used for subsequent analyses. The maximum allowed intron size was 5kb (a conservative parameter to avoid a high false discovery rate). At each potential splice junction, spanning reads were counted. Potential splice junctions were compared to annotated splice sites. To identify differential splice junctions between two groups, normalized read counts of splice junctions were required to be eight times different between resistant and susceptible groups. An unpaired t-test was performed on normalized sequence read counts. Splice junctions at P ≤ 0.05 were considered candidates junctions that were differentially regulated between resistant and susceptible groups. […]

Pipeline specifications

Software tools GSNAP, Bowtie, edgeR, DEGseq, GOseq, TopHat
Application RNA-seq analysis
Organisms Bos taurus
Diseases Gastrointestinal Diseases
Chemicals Arachidonic Acid