Coding and noncoding region discrimination software tools | Transcription data analysis
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task.
Distinguishes protein-coding from non-coding RNAs. CPC employs a discriminative model based on four sequence intrinsic features. The CPC model is species-neutral, making it useful for ever-growing non-model organism transcriptomes and even transcriptomes of organisms that are poorly annotated or lack genome assembly. The web server is mobile-friendly and more accessible on mobile devices such as the iPad.
A method to determine whether a multi-species nucleotide sequence alignment is likely to represent a protein-coding region. It does not rely on homology to known protein sequences; instead, it examines evolutionary signatures characteristic to alignments of conserved coding regions, such as the high frequencies of synonymous codon substitutions and conservative amino acid substitutions, and the low frequencies of other missense and non-sense substitutions (CSF = Codon Substitution Frequencies).
A powerful signature tool by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan.
Provides a prototype noncoding RNA genefinder, based on comparative genome sequence analysis. QRNA detects conserved RNA secondary structures, including both ncRNA genes and cis-regulatory RNA structures. It uses three different probabilistic models (for RNA-structure-constrained, coding-constrained, and position-independent evolution) to examine the pattern of mutations in a pairwise sequence alignment. The alignment is classified as RNA, coding, or other, according to the Bayesian posterior probability of each model. This program is freely available for download.
A program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. It is open source software and available for all major platforms.
An alignment-free tool which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. It is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data.