Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Aligns short read geared toward mammalian re-sequencing. Bowtie is based on a Burrows-Wheeler index based on the full-text minute-space (FM) index. It follows two steps: an initial, ungapped seed-finding stage that derives advantage from the speed and memory efficiency of the full-text minute index and a gapped extension stage that employs dynamic programming and benefits from the efficiency of single-instruction multiple-data (SIMD) parallel processing available on modern processors.
Permits users to perform gapped alignment. Bowtie2 is a program that enables gapped alignment by dividing the algorithm broadly into two stages: (1) an ungapped seed-finding stage that benefits of the full-text minute index; and (2) a gapped extension stage that uses single-instruction multiple-data (SIMD) parallel processing. Furthermore, this tool includes features for indexing genome with an FM index to keep its memory footprint small.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.
Allows users to conduct large-scale comparisons of their results with thousands of reference datasets and genome annotations in seconds. GIGGLE permits to identify novel and unexpected relationships among local datasets as well as the vast amount of publicly available genomics data. It uses a temporal indexing scheme to create a single index of the genome intervals from thousands of annotations and genomic data files.
A high performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing (NGS) data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability.
Deals with RNA structure probing and post-transcriptional modifications mapping high-throughput data. RNA Framework is a modular toolkit. Its main features are (i) automatic reference transcriptome creation, (ii) automatic reads preprocessing (adapter clipping and trimming) and mapping, (iii) scoring and data normalization and (iv) accurate RNA folding prediction by incorporating structural probing data. It can perform not only RNA Structure analysis, but also analysis of RNA post-transcriptional modifications mapping experiments (such as m1A-seq, m6A-seq, 2OMe-seq, and Pseudo-seq).
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.
Indexes reference sequences. SOAP2 employs Burrows Wheeler Transformation (BWT) compressed index to work. It can align single-end reads, identify the best alignment hits and align paired-end read. This tool can map short reads onto a reference sequence for large-scale resequencing projects. It can confront the assembled sequence to the reference genome to find single nucleotide polymorphisms (SNPs). This version, by using BWT compressed index instead of a seed algorithm, has a better alignment speed and less use of memory.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
Indexes position sorted files in TAB-delimited formats such as GFF, BED, PSL, SAM and SQL export, and quickly retrieves features overlapping specified regions. Tabix features include few seek function calls per query, data compression with gzip compatibility and direct FTP/HTTP access.
A software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. The BamTools C++ API/library has been successfully integrated into a variety of applications. It provides the BAM file support for several utilities in the BEDtools suite.
Maps short reads using a redesigned data structure. SOAP3 can determine if a pattern would introduce too many branches during the runtime. It enables researchers to conduct alignments with up to four mismatches. This third version of SOAP uses the multi-processors of the GPU to improve its speed. It is not heuristic-based and reports all answers for an inputted file.
A tool for constructing the FM-index for a collection of DNA sequences. ropeBWT works by incrementally inserting one or multiple sequences into an existing pseudo-BWT position by position, starting from the end of the sequences. This algorithm can be largely considered a mixture of BCR and dynamic FM-index. Nonetheless, ropeBWT2 is unique in that it may implicitly sort the input into reverse lexicographical order (RLO) or reverse-complement lexicographical order (RCLO) while building the index.
Accelerates the locating operation of FM-indexes for genomic data. FMtree is a locating algorithm that permits to build a conceptual multiway tree. By utilizing this multiway tree, FMtree is able to calculate the non-sampled positions block-by-block. It can also be applied to any implementation of FM-indexes without modification. This algorithm is cache-friendly and avoids many unnecessary operations.
A program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95% during the analysis of 10Kbp genomic regions, eventually enabling the joint analysis of more than 10,000 individuals. As sequencing is becoming more and more common, chopBAI will be equally useful for analyzing large sequencing cohorts of other species where the BAI indexing scheme allows for fast access to small subsets of reads.
Implements an indexing data structure for compacted de Bruijn graph (dBG) and colored compacted dBG. pufferfish exploits a minimum perfect hash function (MPHF) and provides to users a k-mer lookup. The data structure of this tool is available through two variants: a dense variant for fast queries and a sparse variant that offers the ability to trade off space for speed in a fine-grained way.
Allows users to map long readings with high error rates. lordFAST is designed to align readings from PacBio sequencing technology. It also allows the user to modify alignment parameters according to readings and application. This application includes both cut and split read alignments, allowing readings from regions to be aligned with long structural variations (SVs).
A highly hardware-acceleration friendly k-ordered FM-index for exact string matching, overlap graph construction for de novo assembly, and more. sBWT is a Burrows–Wheeler transform (BWT) based fast indexer/aligner specialized in parallelized indexing and searching for next-generation sequencing data. In our tests, the implementation achieves significant speedups in indexing and searching compared to other BWT based tools and can be applied to a variety of domains.