Searches and clusters algorithms that can be orders of magnitude. USEARCH is a sequence analysis software which combines different algorithms into a single package. This software searches in database for top global hits and provides several NGS read processing features such as dereplication, paired read overlapping, quality filtering, FASTQ file statistics or chimeric sequence filtering.
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information). The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.
A widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques has been developed to allow efficient clustering of such datasets.
Aims to simplify the manipulation of sequence files. OBITools is a set of python programs specifically designed for analysing NGS data. It allows users to set up versatile data analysis pipelines, adjustable to the broad range of DNA metabarcoding applications. As inputs, this package is able to automatically recognize the most common sequence file formats.
Performs fast, accurate and specific classification of xenograft-derived sequence read data. Xenome technique is based on a k-mer decomposition of the host and graft reference sequences. It has been evaluated on RNA-Seq data from human, mouse and human-in-mouse xenograft datasets. It can be used to efficiently and effectively partition the read set for subsequent processing by tools such as Tophat.
A toolkit for processing and analysing RAD sequencing data. The tools are designed to process de novo RAD data, that is, data from species without a reference genome. RADtools integrates RADpools for separating raw Illumina reads into separate pools, RADtags for clustering the reads for each pool candidate RAD tags for that pool, RADmarkers for clustering tags across all pools into candidate loci with alleles, RADMIDS for designing a set of MIDS for use in RAD adapters. These RAD methods have great potential for creating genomic scaffolds to assist in genome assembly and for identifying thousands of sequence variants to aid in detection of major as well as minor quantitative traits.
Houses tools for researchers to process and analyze their own functional gene sequencing data. FGP offers a pipeline where researchers can assemble a set of analysis tools to process a nucleotide sequence file, filter chimeric sequences, translate the nucleotide sequences, align, and cluster the protein sequences and additionally run the optional cluster file analysis tools. FGP allows libraries of sequence reads to be analyzed through either reference-based or unsupervised approaches after common initial processing steps. Reference-based approaches, such as the FrameBot frameshift correction and nearest neighbor tool offered by FGP, require a set of representative sequences, which can be compiled using the FunGene Repository (FGR).