Computational protocol: Transcriptome Analysis of Artificial Hybrid Pufferfish Jiyan-1 and Its Parental Species: Implications for Pufferfish Heterosis

Similar protocols

Protocol publication

[…] The reads alignment was performed using BioScope™ software (Life Technologies) with default parameters. The genome of Takifugu rubripes (v4.64) was downloaded from Ensembl as the reference. The classic mapping strategy “seed-and-extend” approach was adopted, with “25.2.0∶20” as mapping scheme (for the 50 base reads, the seed might be 25 base long with up to two mismatches allowed, and the start site of seed could be 0 or 20). Reads alignment score was calculated as: score = len-nm*1(1+mp)−jp, where len = number of alignment hits (colors for SOLiD reads), nm = mismatch number, mp = mapping mismatch penalty, jp = penalty for alignment to a junction. After alignment, only the unique alignments or the alignments sufficiently better than any suboptimal hits could be outputted.For abundance analysis, a home-made Perl script was used to extract the gene loci coordinates from the gene set file of T. rubripes downloaded from Ensembl. The gene loci coverage was estimated using the CoverageBed tool from BEDtools . Cufflinks was used to calculate transcript abundances and identify the DT based on the mapping results. The variation of transcripts abundance was described by fold change value: ln(x/Jiyan-1), while two ln (x/Jiyan-1) values were calculated separately for each transcript (x = tiger puffer or tawny puffer).BioMart was used for GO identifier retrieving and GO terms assignment of all tiger puffer (T.rubripes) sequences. For other sequences, local blastp search was first performed to align sequences to the non-redundant database of NCBI with E-value<1E−5. Then alignment result was parsed by Blast2GO for assigning GO terms with parameters of E-value<1E−5, annotation cutoff >55 and GO weight >5. The GO enrichment analysis was implemented by the one-tailed Fisher’s exact test with filter value was set as 0.01. DTHPco were selected as test set while all identified transcripts were taken as the reference set. The GO terms which were significantly over-enriched in test set were reported. DTHPco were mapped to the KEGG database , using built-in function of Blast2GO for the pathway retrieving. All mapping and retrieving steps were performed by default setting.The novel transcript prediction was performed by the Reference Annotation Based Transcript (RABT) assembly of Cufflinks. The mapping results (bam files) were assembled with the aid of reference annotation. Potentially novel transcript isoform was defined as at least one splice junction was shared with a reference transcript. Three predicted transcript annotation files were compared with the T.rubripes annotation and each other using cuffcompare. Potentially novel isoforms with low evidence number were discarded and the remaining ones were summarized by home-made Perl script. Sequences of predicted transcripts were extracted by the gffread tool. Sequence alignment was performed using blast+ . The protein sequences were obtained with 3-frame translation, and only the sequences with the length from 50aa to 3,000aa and no premature stop codon were used for the annotation by InterProScan . […]

Pipeline specifications

Software tools SOLiD BioScope Software, BEDTools, Cufflinks, BioMart, BLASTP, Blast2GO, GFF utilities, InterProScan
Application Genome annotation
Organisms Takifugu rubripes, Takifugu flavidus
Diseases Graft vs Host Disease