Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and able to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.
Identifies and adjusts errors in sequencing reads by using k-mer coverage. Quake differentiates k-mers trusted to be in the genome and k-mers that are untrustworthy artifacts of sequencing errors. The software exploits read quality values and determine types of errors by generating nucleotide to nucleotide error rates. It can be deployed on large datasets containing billions of read if a set of corrections makes all k-mers trusted.
Simplifies variant annotation and filtering. Bystro is able to handle sequencing experiments on the scale of thousands of whole-genome samples and tens of millions of variants online in a web browser. It integrates search engine for filtering variants and samples from these experiments, and it enables real-time (sub-second), nuanced variant filtering, both across all samples and per sample, using simple phrases and interactive, web-based filters. It assists users to find alleles of interest in any sequencing experiment.
Permits users to parse, analyze and manipulate VCF files. VCFtools is a software package for composed of two modules: the first is a general API that allows various operations to be performed on VCF files, including format validation, merging, comparing, intersecting, making complements and basic overall statistics; the second module analyze single-nucleotide polymorphism (SNP) data in VCF format, assisting researchers to estimate allele frequencies, levels of linkage disequilibrium and various quality control (QC) metrics.
Annotates and filtrates variant files. VarAFT allows the comparison of several individuals and the collection of relevant information about the variations. It includes a coverage analysis module to easily visualize regions that are poorly covered though tables and dynamic charts. With VarAFT, users can annote variant (VCF) files, combine multiple samples from various individuals, prioritize list of variants by multi-filtering parameters. Additionnaly, users can perform a coverage analysis and quality check from any BAM file.
A statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data. BCFtools can manipulate variant calls in the variant call format (VCF) and its binary counterpart BCF. It also can discover somatic and germline mutations with appropriate input data, efficiently estimate site allele frequency, allele frequency spectrum and linkage disequilibrium, and test Hardy–Weinberg equilibrium and association.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
A software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. The BamTools C++ API/library has been successfully integrated into a variety of applications. It provides the BAM file support for several utilities in the BEDtools suite.
A suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. With modules that operate from FASTQ pre-processing through BAM post-processing and RPKM calculations, NGSUtils compliments existing tools and provides unique functionality that helps each step of an NGS data analysis pipeline. NGSUtils covers different aspects of NGS data analysis, including pre-processing, post-processing, filtering, format conversion and final result calculations. NGSUtils provides a stable and modular platform for data management and analysis.
Analyzes or annotates VCF files and organizes tools that perform diverse analyses using VCF files. VCF-kit adds essential utilities to process and analyze VCF files, including primer generation for variant validation, dendrogram production, genotype imputation from sequence data in linkage studies, and additional tools. It can be used to produce a phylogenetic tree from a VCF. The tool centralizes a collection of tools and scripts using variant call format.
Offers an assortment of tools suited for sequence analysis. Japsa is an open source package that gathers more than 20 tools including a java library and an API. The application provides a wide range of functionalities that allows users to split multiple sequences files, to perform real-time identification of antibiotic resistance gene with Oxford Nanopore sequencing as well as to normalize the branch length of a phylogeny.
Reduces the need of alignment verification in DNA read mapping. GateKeeper is a hardware acceleration system for alignment filtering designed to utilize the large amounts of parallelism offered by Field-Programmable Gate Arrays (FPGAs) architectures. It can filter average 4 trillion mappings within 40 mins using a single FPGA chip while preserving all correct ones. This method can improve the performance of existing and future read mappers.
A C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. It is easy to compile and run, and is extensively documented with a number of use cases and examples.
Permits quality control of Next-Generation-Sequencing (NGS) tumor-normal experiments. NGS-Bits is separate into four steps: (1) gather information from raw reads, (2) map reads, (3) extract variant lists, and (4) combine result from precedent steps to then add quality control (QC) metrics for tumor-normal experiments. This tool includes all stages of single-sample NGS data analysis and adds special QC metrics for DNA sequencing of tumor-normal pairs.
Provides the ability to filter variants based upon variant annotation. cyvcf is a high-performance library that provides researchers with an intuitive Python interface for manipulating VCF files. This method permits to interrogate the details of each sample’s genotype information, and rapidly compute both variant and sample level statistics. It also offers full programmatic flexibility that can come with minimal performance penalties owing to the careful design.
Allows users to reformate and filter bioinformatics files. JVARKIT aims to simplify the grammar employed to filter bioinformatic file, for rendering possible to write a loop or a custom function. JVARKIT is a set of more than 100 java-based tools for bioinformatics.
Provides a library written in Nim programming language that suits for simple and scripting-like syntax. hts-nim is a garbage-collected language, compiles to C and its syntax is similar to python. This library can be useful for parsing genomics data files.