Computational protocol: Transcriptome analysis reveals long intergenic non-coding RNAs involved in skeletal muscle growth and development in pig

Similar protocols

Protocol publication

[…] The raw RNA-seq reads were first mapped to the pig reference genome (Sus scrofa 10.2, http://ftp.ensembl.org/pub/release-75/fasta/sus_scrofa/dna/) by Tophat 2.0.13 with default parameters. Meanwhile, we set the “−G” option of Tophat together with the Gene Transfer Format (GTF) file of Ensembl gene annotation for reads mapping. The mapped reads were assembled through Cufflinks 2.1.1 with default parameters. Cufflinks uses a probabilistic model to assemble and quantify the expression level of a minimal set of isoforms and provides a maximum likelihood explanation of the expression data in given loci. Three assembled transcript files (GTF format) of three groups were then merged into a unique transcriptome using Cuffmerge utility provided by the Cufflinks package. The lincRNAs detection pipeline was used to filter the merged assembly. [...] We used following steps to identify lincRNAs from the pig leg muscle transcriptome: (1) only transcripts with ‘u’ category categorized by Cuffmerge which indicated intergenic transcripts were retained; (2) transcripts with single exon or less than 200 bp in length were removed; (3) the Coding Potential Calculator (CPC) tool was used to assess the coding potential of transcripts in both strands, and only transcripts with CPC value <0 in both strands were retained; (4) transcripts that contained known protein domain were filtered. To accomplish this, we translated transcripts sequence into six possible protein sequence by Transeq (http://www.ebi.ac.uk/Tools/st/emboss_transeq/), and then transcripts with any possible protein sequence significantly (E-value < 1e-5) hit in the Pfam (http://pfam.xfam.org/search) database were filtered; (5) to minimize false positive, we selected transcripts that had detectable expression in all three groups. [...] The raw MeDIP-seq reads were first mapped to the pig reference genome (Sus scrofa 10.2, http://ftp.ensembl.org/pub/release-75/fasta/sus_scrofa/dna/) by Bowtie2 2.2.3 with default parameters. Then HTSeq-count was used to calculated the methylation level of each lincRNA genes, and we used normalized reads number to represent the methylation level of each lincRNA genes. We defined the promoter region as the upstream 2 kb of the transcription start site of lincRNA genes. [...] Based on the assembly result, we had the position information of each transcript. We defined a lincRNA PTG as protein-coding genes that were transcribed nearby (<10 kb) lincRNAs, and we got all PTGs by BEDTools 2.17.0. Then we performed DAVID (Database for Annotation, Visualization and Integrated Discovery) analysis by running queries for each PTG against the DAVID database. Because of the limited annotation of the porcine genome, all PTGs were firstly converted into human homologous genes using BIOMART from Ensembl (http://www.ensembl.org/biomart/martview/8143dcf2a64771c957b28d28832759b6). […]

Pipeline specifications