Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and able to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Performs peak finding and downstream data analysis for next-generation sequencing analysis. HOMER affords several tools and methods to make use of ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and other types of functional genomics sequencing data sets. This software offers support to UCSC visualization, peaks annotation, quantification of transcripts and repeats or differential features, enrichment and expression.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.
Performs gene and isoform level quantification from RNA-Seq data. RSEM is a software package that quantifies gene and isoform abundances from single-end (SE) or paired-end (PE) RNA-Seq data. The software enables visualization of its output through probabilistically-weighted read alignments and read depth plots. It does not require a reference genome and thus can be useful for quantification with de novo transcriptome assemblies.
A Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner.
A flexible toolkit for exploring datasets generated by nanopore sequencing devices from MinION for the purposes of quality control and downstream analysis. Poretools operates directly on the native FAST5 (an application of the HDF5 standard) file format produced by ONT and provides a wealth of format conversion utilities and data exploration and visualization tools.
Provides assistance for the problem of mapping various types of IDs to each other. Onto-Translate brings to users a non-redundant and complete mapping from any type of identification system to any other type. This software exploits the custom design of Onto-Tools database that contains 20 publicly available biological databases such as KEGG or GenBank. It permits to perform conversions of individual genes in one format into another.
Builds mapping assemblies from short reads generated by the next-generation sequencing machines. Maq is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment. For single-end reads, maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for paired-end reads, it always finds all paired hits with one of the two reads containing up to 1 mismatch. At the assembling stage, maq calls the consensus based on a statistical model.
Allows to manipulate, organize, summarize and visualize MinION nanopore sequencing data. poRe enables users to manipulate MinION FAST5 files into run folders, extract FASTQ, gather statistics on each run and plot a number of key graphs, such as read-length histograms and yield-over-time. Two graphical-user-interfaces (GUIs) for MinION data processing, organization and extraction are available through the package.
Improves the design and use of polymerase chain reaction (PCR)-based methylation assays. methPrimer was developed to store and retrieve validated methylation assays. This resource is intended to be a search portal for validated methylation assays. It also aims to establish a certain level of standardization and uniformity in the use of PCR based methylation assays. Each primer set is provided with a unique identifier to access them directly or refer to in a publication.
Permits users to parse, analyze and manipulate VCF files. VCFtools is a software package for composed of two modules: the first is a general API that allows various operations to be performed on VCF files, including format validation, merging, comparing, intersecting, making complements and basic overall statistics; the second module analyze single-nucleotide polymorphism (SNP) data in VCF format, assisting researchers to estimate allele frequencies, levels of linkage disequilibrium and various quality control (QC) metrics.
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.
Enables reading of sequencing files from the SRA database and writing files into the same format. The NCBI SRA Toolkit is provided in the form of the SRA SDK, and can be compiled with GCC. It allows users to programmatically access data housed within SRA and convert it from the SRA format: ABI SOLiD native, fasta, fastq, sff, sam, Illumina native. This method is available for all commons platforms.
A statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data. BCFtools can manipulate variant calls in the variant call format (VCF) and its binary counterpart BCF. It also can discover somatic and germline mutations with appropriate input data, efficiently estimate site allele frequency, allele frequency spectrum and linkage disequilibrium, and test Hardy–Weinberg equilibrium and association.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
Advances the automation and visualization of RNA-seq data analyses results. QuickRNASeq is a pipeline that significantly reduces data analysts’ hands-on time, which results in a substantial decrease in the time and effort needed for the primary analyses of RNA-seq data before proceeding to further downstream analysis and interpretation. It provides a dynamic data sharing and interactive visualization environment for end users and enable non-expert end users to interact easily with the RNA-seq data analyses results.
A software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. The BamTools C++ API/library has been successfully integrated into a variety of applications. It provides the BAM file support for several utilities in the BEDtools suite.
Aims to facilitate data exchange and file conversions between population genetics programs. PGDSpider is able to read 27 different file formats and can export data into 29 other file formats. It can be integrated in complex data analysis pipelines thank to its command line version. The tool provides feature to store a preferred conversion settings in order to repeat conversions of similar input formats.
Aims to ease high-throughput sequencing (HTS) data analysis by the using of distributed computation. Eoulsan is a framework able to perform its tasks on distributed computers. The application includes batch analyses, a full automation process managing external file locations and distributed file system. It can be run according three modes: standalone, local cluster or cloud computing on Amazon Elastic MapReduce.
Provides comprehensive alignment-based analysis of Nanopore reads through a simple, easy to use interface. NanoOK generates detailed tabular and graphical output plus an in-depth multi-page PDF report including error profile, quality and yield data. NanoOK is multi-reference, enabling detailed analysis of metagenomic or multiplexed samples. Four popular Nanopore aligners are supported and it is easily extensible to include others.
A conversion tool to read and write SAM, BAM and CRAM formats using a unified Application Programming Interface (API). It also permits the most efficient use of threads when converting between differing file formats, automatically balancing the encoder and decoder work loads. Scramble is not a drop-in replacement for the Samtools API; however, a port of the CRAM components of Scramble has been made to the HTSlib library and is available within Samtools.
Analyzes or annotates VCF files and organizes tools that perform diverse analyses using VCF files. VCF-kit adds essential utilities to process and analyze VCF files, including primer generation for variant validation, dendrogram production, genotype imputation from sequence data in linkage studies, and additional tools. It can be used to produce a phylogenetic tree from a VCF. The tool centralizes a collection of tools and scripts using variant call format.
A suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. With modules that operate from FASTQ pre-processing through BAM post-processing and RPKM calculations, NGSUtils compliments existing tools and provides unique functionality that helps each step of an NGS data analysis pipeline. NGSUtils covers different aspects of NGS data analysis, including pre-processing, post-processing, filtering, format conversion and final result calculations. NGSUtils provides a stable and modular platform for data management and analysis.
Converts BioPAX level 2 and level 3 files into SBML files including the Qualitative Models extension. BioPAX2SBML includes pathways from BioCarta, Reactome, and from the National Cancer Institute, from BioPAX formats to the SBML format, including the qual extension. Compared to existing conversion approaches with similar scope, BioPAX2SBML conversions result in comprehensive and correct SBML models, created for all pathways in the nature PID.
Assists users with molecular data. Fconv is a method that manipulates, converts, and analyzes data from different file formats. It allows user to use modified definition files as input, thus enabling tailor-made atom type sets. This application is structured as following: (i) parse input files; (ii) perceive atom types; (iii) assign physicochemical properties and/or manipulate structures and/or apply geometric calculations; and (iv) produce output data/files.
Permits quality control of Next-Generation-Sequencing (NGS) tumor-normal experiments. NGS-Bits is separate into four steps: (1) gather information from raw reads, (2) map reads, (3) extract variant lists, and (4) combine result from precedent steps to then add quality control (QC) metrics for tumor-normal experiments. This tool includes all stages of single-sample NGS data analysis and adds special QC metrics for DNA sequencing of tumor-normal pairs.
Identifies a web enabled isomorphic map between Variant Call Format (VCF) and Resource Description Framework (RDF). VCF2RDF is a VCF parser that acts as an isomorphic mapping function to (evolvable) linked data entirely within 3rd generation Web Technologies.
Allows users to reformate and filter bioinformatics files. JVARKIT aims to simplify the grammar employed to filter bioinformatic file, for rendering possible to write a loop or a custom function. JVARKIT is a set of more than 100 java-based tools for bioinformatics.
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information). The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.
Reads files created with the graphical network editor Escher and converts them to files in community standard formats. EscherConverter is written in Java, and it is available as a standalone executable file that includes a graphical user interface with graph drawing capabilities and a command-line interface. It converts between SBML, SBGN-ML, and JSON-files.
Serves as a hub for data input, format conversion, and data export to other applications. ChIP-Convert imports and converts external data formats into compressed SGA. The software also provides more specific conversion schemes such as proper conversion of the BED-like narrow peak format used by ENCODE. It can be used to export data from the mass genome annotation (MGA) repository in other formats such as BED or FPS.
Allows users to analyze, filter, annotate or transform biological sequence data. FAST is able to realize automated sampling, permutations and bootstrapping of sequences and sites and compute a population genetic statistics. It can assist empower non-biologist programmers to develop and communicate bioinformatics workflows for scientific investigations and publishing.
Analyzes raw sequencing data from several next generation sequencing (NGS) platforms. MutAid is a pipeline performing six different steps: (i) quality control and filtering; (ii) mapping reads to reference genome; (iii) variant detection, effect prediction and cross-referencing and lastly (iv) and then produces a summary of all information generated. It can be used to interpret mutational variants from various data generated by targeted gene-panel sequencing or whole genome sequencing.
A GUI software for providing graphical user interface (GUI) to facilitate the recently developed restriction-site associated DNA (RAD) sequencing data analysis in rice. AMF integrates sophisticated tools with self-developed algorithms that can help users finish data analysis with simple operation. It consists of five independent modules: FilterAndMapping, BamConvert, NPInDel, DetectionAndAnnotation, SomaticDetection and VariantLocation. Based on this software, large volumes of polymorphism data have been discovered and analyzed, which will be meaningful for further application, such as genetic mapping, genetic map construction, evolutionary studies and marker-assisted seletion.
Allows users to filter, convert and combine multiple data files produced by high-throughput technologies. HTDP aims to aid global, real-time processing of large data sets using GUI. The software provides unlimited filtering and data reduction capabilities, also using itemized filtering conditions from external files. It can be used for conversion between different standard formats that are commonly used for high-throughput data.
Speeds up pre-processing for next-generation sequencing (NGS) data. sam2bam converts the data format from SAM to BAM. It consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators. This tool provides plug-in functions that can be used to analyze, filter, and convert input data.
Facilitates the design, optimization, and tracking of barcoded oligonucleotides. XSTK is useful for projects that require highly multiplexed polymerase chain reaction (PCR) and DNA sequencing. It builds a list of all possible DNA sequences of a specified length and then progressively culls sequences that may interfere with primary PCR amplification and/or sequencing steps.
Aims to search and retrieve The Cancer Genome Atlas (TCGA) data. TCGA2BED converts them in the structured BED format for their seamless use and integration. Additionally, it supports the conversion in CSV, GTF, JSON, and XML standard formats. Furthermore, TCGA2BED extends TCGA data with information extracted from other genomic databases (i.e., NCBI Entrez Gene, HGNC, UCSC, and miRBase). TCGA2BED also provides an automatically updated data repository with publicly available Copy Number Variation (CNV), DNA-methylation, DNA-seq, miRNA-seq, and RNA-seq (V1,V2) experimental data of TCGA converted into the BED format, and their associated clinical and biospecimen meta data in attribute-value text format.
A simple GUI software tool for visualizing published ChIP-seq raw data. SraTailor automatically converts an SRA into a BigWig-formatted file. Simplicity of use is one of the most notable features of SraTailor: entering an accession number of an SRA and clicking the mouse are the only steps required to obtain BigWig-formatted files and to graphically visualize the extents of reads at given loci. SraTailor is also able to make peak calls, generate files of other formats, process users' own data, and accept various command-line-like options. Therefore, this software makes ChIP-seq data fully exploitable by a wide range of biologists.
A scalable bioinformatic tool for exploring and analyzing nanopore sequencing data that can run both individual computers and in the Hadoop distributed computing framework. The Hadoop environment allows virtually unlimited scaling up in data size and provides better runtimes for datasets containing a large number of reads. HPG Pore allows efficient management of huge amounts of data and thus constitutes a practical solution for data analysis needs in the near future as well as a promising model for the development of new tools to deal with future genomic big data.
Supplies a set of features dedicated to quality check (QC) of next generation sequencing (NGS) data. EasyQC is an open source application compatible with various NGS platform. This application includes modules allowing FASTQ conversion, homopolymer and primer trimming, a graph generator, trimming based on quality, length or ambiguity and a function for paired-end merging.
Allows users to support conversion between different Next Generation Sequencing (NGS) files. NGS-FC is a crossed-platform software which summarizes information from 14 NGS databases. It can be used as a converter tool or as a framework to add new conversion classes and databases. Its supports external scripts, and format conversion scripts can thus be integrated.
Reads and writes nucleic/protein sequences in various formats. ReadSeq is a conversion program for bioinformatics, that can read and reformat 18 different formats. The software includes a Graphic User Interface (GUI), Command Line Interface (CLI) and also a Common Gateway Interface (CGI) for use from a web server.
Facilitates translation of biomedical research questions to language amenable for computational analysis. GROK supports various deep sequencing (DS)-related operations such as preprocessing, filtering, file conversion, and sample comparison. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. The tool can facilitate answering biomedical research questions and establish experimentally testable predictions.
Provides utility functions implementing commonly used genomic operations. bedr is a formal BED-operations framework that offers a formal R interface to interact with BEDTools and BEDOPS. In addition to sort operations, it also supports identification of overlapping regions which can be collapsed to avoid downstream analytical challenges. This method is compatible with the ubiquitous BED tools paradigm and integrates with R-based workflows.
An open-source software using Clojure, which is a functional programming language that works on the Java Virtual Machine. Cljam can process and analyze SAM/BAM files in parallel and at high speed. The execution time with cljam is almost the same as with SAMtools. The Clojure code of cljam has fewer lines and an equivalent performance compared with SAMtools and Picard, which are similar tools.