Computational protocol: Transcriptomic Profiles of Brain Provide Insights into Molecular Mechanism of Feed Conversion Efficiency in Crucian Carp (Carassius auratus)

Similar protocols

Protocol publication

[…] For ensuring high-quality data, raw data (raw reads) of fastq format were firstly processed through in-house Perl scripts, which eliminated all those reads with sequencing adapter and nucleotides in reads with quality value less than 20 in both end. In this step, good quality sequences of clean data (clean reads) were obtained by abandoning reads containing adapter, reads containing ploy-N and low-quality reads from raw data. At the same time, Q30, GC-content and sequence duplication level of the clean data were calculated. The clean data of this article are publicly available in the NCBI Sequence Reads Archive (SRA) with accession number PRJNA433432.All clean reads of the six libraries were jointly assembled into contigs employed by Trinity software []. Since there is no reference genome of Carassius auratus, a k-mer value cutoff of 25 was used after removing redundant nucleotide sequences by Tgicl (v2.1, Then, unigenes were generated by connecting the contigs (longer than 200 bases) to obtain sequences that could not be extended on either end, and maximum length non-redundant unigenes were acquired by further splicing and assembling using TGICL clustering software (J. Craig Venter Institute, Rockville, MD, USA). Finally, unigenes were aligned against the Nr (NCBI non-redundant protein sequences), Swissprot (A manually annotated and reviewed protein sequence database), COG (Clusters of Orthologous Groups of proteins), and KEGG (Kyoto Encyclopedia of Genes and Genomes) of protein databases using BlastX with an E-value <10−5. GO (Gene Ontology) annotation of these unigenes was performed using Blast2GO ( based on the results of the NCBI Nr database annotation. Blastn was used to align these unigenes with the Nr database to search for proteins with the highest sequence similarity to the given unigenes and annotate their protein functions at the same time. [...] Gene expression levels were estimated by RSEM (RNA-Seq by Expectation Maximization) [] software package for each sample. The mapped reads were normalized according to fragment per kilobase of exon model per million mapped reads (FPKM) for each unigene between the two groups (Low_vs_High). Differentially expressed genes (DEGs) between the two groups were identified by the DEGseq package (samples with three biological replicates) applying the MA-plot-based method with Random Sampling model (MARS) method. In this study, DEGs with significant expression abundance between the two groups were selected using the following filter criteria: p-value < 0.01 and the absolute value of log2 Ratio ≥ 1, meaning each DEG between two groups should be at least two-fold. In order to determine the potential functions and metabolic pathways of these DEGs, COG, GO and KEGG enrichments were further analyzed. COG annotation of the DEGs was performed using Blastall software. GO enrichment analysis (p-value ≤ 0.05) of the DEGs was implemented by the topGO R packages based Kolmogorov–Smirnov test. Based on the hyper-geometric distribution model, we used KOBAS software [] to test the statistical enrichment of DEGs in KEGG pathways, and the enrichment p-values were adjusted using the Benjamin and Hochberg method. […]

Pipeline specifications

Software tools Trinity, TGICL, BLASTX, Blast2GO, BLASTN, RSEM, DEGseq, TopGO, KOBAS
Databases KEGG
Applications RNA-seq analysis, Transcription analysis
Organisms Carassius carassius, Carassius auratus, Danio rerio