Gene prediction software tools | Shotgun metagenomic sequencing data analysis
Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome.
A command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. Prokka uses parallel processing to decrease running time on multicore computers. The most time-consuming steps are BLAST+ and hmmscan, which both support multiple CPUs natively. However, Prokka is more efficient if it runs multiple single CPU threads on subsets of the data, which it achieves using GNU parallel.
Determines protein-coding open reading frames (ORFs). GeneMarkS enables the retrieval of genome-wide features of transcription and translation mechanisms. It utilizes a complex model of a gene, a building block of the model of a prokaryotic genome to model a protein-coding sequence. This tool employs the division of the atypical models to characterize atypical genes as bacterial or archaeal into distinct bacterial and archaeal types.
Programs search nucleotide databases by using a nucleotide query. BLASTN key features are searching with short sequencing and cross-species comparison. Users can select an optimization according to: (i) highly similar sequences, (ii) more dissimilar sequences or (iii) somewhat similar sequences. This web application proceeds by searching sets in NCBI data sources.
Builds transcriptomes from RNA-seq data. Trinity is a standalone software composed of three main components: (i) Inchworm, that first generates transcript contigs; (ii) Chrysalis, for clustering them and constructing complete de Bruijn graphs for each cluster and; (iii) Butterfly that processes individual graphs in parallel for finally resulting to the reconstruction of the transcript sequences.
Allows users to perform gene prediction. GeneMark offers a web platform that provides an interface to analyze sequence in conjunction with about 170 pre-computed species-specific statistical models. The application also supplies advanced features which enable the settings of both windows and step size as well as the possibility to generate a graph using native or heuristic parameters. This program is part of the GeneMark Suite.
Provides sensitivity in identifying existing genes. Prodigal is a gene-finding program for microbial for genome annotation of either draft or finished microbial sequence. It was developed to predict translation initiation sites more accurately. This application also permits to minimize the number of false positive predictions. This method can be useful for automated microbial annotation pipelines.