Computational protocol: The Carcinogenic Liver Fluke, Clonorchis sinensis: New Assembly, Reannotation and Analysis of the Genome and Characterization of Tissue Transcriptomes

[…] RNA-Seq data from four tissues (sucker, muscle, ovary and testis) were filtered by Fastx-tools using the following criteria: 1) reads containing sequencing adaptors were removed; 2) nucleotides with a quality score lower than 20 were trimmed from the end of the sequence; 3) reads shorter than 50 bp were discarded; and 4) artificial reads were removed.For each tissue, clean RNA-Seq data were assembled using Velvet , and contigs were clustered using TGICL . The ESTs used in our previous study and one by Korean researchers were clustered using TGICL, and 454 sequencing data were assembled by the Newbler Assembler. All of the above assemblies were clustered using CAP3 , and putative, full-length CDS for gene annotation were predicted using OrfPredictor .RNA-seq reads were mapped to the genome using TopHat , and expression level of gene represented by value of fragments per kilobase of exon model per million (FPKM) was estimated using Cuffdiff , according to our gene models using default parameters. An FPKM filtering of 1.0 in at least one of the four tissues was used to determine expressed genes. Genes that were determined to be expressed significantly differently in at least one of the four comparisons (testis vs. ovary, testis vs. muscle, ovary vs. muscle, and sucker vs. muscle) were considered to be differently expressing genes (DEGs). The DEGs were normalized by total expression in the four tissues and then clustered by the gplots package in R, followed by Gene Ontology (GO) enrichment analysis performed by BinGO with Hypergeometric test and Benjamini & Hochberg False Discovery Rate (FDR) correction at a significance level of 0.01.Gene structures, alternative splicing events and transcriptome profiles were visualized by using Gbrowse software . Users can log in to the website of, which allows users to navigate by scaffold coordinates, gene or transcript IDs.To understand how adult liver flukes obtain enough energy, KEGG reference pathways were used to analyze the energy metabolism gene network by comparing the genome and transcriptome of C. sinensis with the genomes of S. japonicum and S. mansoni. […]

Pipeline specifications

Software tools Velvet, TGICL, Newbler, CAP3, OrfPredictor, TopHat, Cufflinks, gplots, BiNGO, GBrowse
Databases KEGG
Applications RNA-seq analysis, Transcription analysis
Organisms Clonorchis sinensis, Fasciola hepatica, Cyclina sinensis, Ilex paraguariensis
Diseases Clonorchiasis, Parasitic Diseases
Chemicals Cholesterol, Fatty Acids, Glucose, Oxygen, Citric Acid