Protein database search software tools | Shotgun metagenomic sequencing data analysis
In metagenomics studies, millions of DNA or cDNA reads are sequenced from environmental samples, and these are then analyzed in an attempt to determine the functional or taxonomic content of the samples. An important computational step is to determine the genes or coding sequences present, which is usually done by aligning the sequences against a reference database of protein sequences.
Finds genomic sequences that match a protein or DNA sequence submitted by the user. BLAT is a very fast sequence alignment tool similar to BLAST typically used for searching similar sequences within the same or closely related species. It was developed to align millions of expressed sequence tags and mouse whole-genome random reads to the human genome at a higher speed. BLAT is commonly used to look up the location of a sequence in the genome or determine the exon structure of an mRNA, but expert users can run large batch jobs and make internal parameter sensitivity changes by installing command line it on Linux server.
Searches protein database using a translated nucleotide query. BLASTX is a BLAST search application that compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. This application can also work in Blast2Sequences mode and can send BLAST searches over the network to public NCBI server if desired.
A fast protein similarity search tool that utilizes a reduced amino acid alphabet and suffix array to detect seeds of flexible length. RAPSearch2 works faster than the original RAPSearch; uses less memory; and supports multi-threading. RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.
Provides an online platform for collecting, managing, analyzing, visualizing and sharing large and complex biomedical data sets. MOLGENIS Research is a life science data solution built on top of the MOLGENIS platform. It integrates apps to inspect, organize, and customize the data. Moreover, it allows the development of various apps for specific tasks, as well as uploading of data models and settings making the platform suitable for specific uses.
An approach toward the problem of comparing DNA reads against a database of protein reference sequences that is applicable to very large datasets consisting of hundreds of millions or billions of reads. PAUDA requires <80 CPU hours to analyze a dataset of 246 million Illumina DNA reads from permafrost soil for which a previous BLASTX analysis (on a subset of 176 million reads) reportedly required 800,000 CPU hours, leading to the same clustering of samples by functional profiles.
An integrated system for bacterial genome annotation. AGMIAL has been developed with the following requirements in mind: maximize annotation automation, ability to work on draft sequences, keep the program modular and extensible, choose informatics and bioinformatics standards when possible (e.g. Web Services, RDBMS, ...), distribute the program under an open-source license.