Gene detection software tools | RNA sequencing data analysis
Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction.
Allows users to detect candidate coding regions into transcript sequences. Transcoder is a standalone software that starts from either a FASTA or GFF file. The application also can scan and retain open reading frames (ORFs) for homology to known proteins by using a BlastP or a Pfam search and incorporate the results into the obtained selection. Predictions can then be visualized by using a genome browser such as IGV.
Searches protein database using a translated nucleotide query. BLASTX is a BLAST search application that compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. This application can also work in Blast2Sequences mode and can send BLAST searches over the network to public NCBI server if desired.
Builds transcriptomes from RNA-seq data. Trinity is a standalone software composed of three main components: (i) Inchworm, that first generates transcript contigs; (ii) Chrysalis, for clustering them and constructing complete de Bruijn graphs for each cluster and; (iii) Butterfly that processes individual graphs in parallel for finally resulting to the reconstruction of the transcript sequences.
Provides sensitivity in identifying existing genes. Prodigal is a gene-finding program for microbial for genome annotation of either draft or finished microbial sequence. It was developed to predict translation initiation sites more accurately. This application also permits to minimize the number of false positive predictions. This method can be useful for automated microbial annotation pipelines.
A pipeline for unsupervised RNA-seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step.
A web server designed for identifying protein-coding regions in expressed sequence tag (EST)-derived sequences. For query sequences with a hit in BLASTX, the program predicts the coding regions based on the translation reading frames identified in BLASTX alignments, otherwise, it predicts the most probable coding region based on the intrinsic signals of the query sequences. The output is the predicted peptide sequences in the FASTA format, and a definition line that includes the query ID, the translation reading frame and the nucleotide positions where the coding region begins and ends. The predicted protein sequences can then be used as the input for additional annotation tools, such as InterProScan, for identifying protein families, domains and functional sites, the Conserved Domain Search service for the detection of structural and functional domains, and SignalP for locating potential signal peptides.