Offers a way to manage pipelines. Toil supports arbitrary worker and leader failure, with strong check-pointing that allows resumption. It can be employed to run scientific workflows on a large scale in cloud or high-performance computing (HPC) environments. This tool was used to compute gene- and isoform- level expression values for 19 952 samples from four studies.
An open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
Allows automated analysis and annotation of complex -omics data. O-Miner is a solution for the analysis and exploitation of data. The software is composed of two analytical domains, genomics and transcriptomics, and a third analytical layer for the analysis of data from methylation arrays. Several established prostate cancer (PCa) biomarkers were identified using O-miner. It O-miner provides researchers with the tools required to conduct powerful analyses of publicly available sequencing data.
Permits alignment of reads and prediction of single nucleotide polymorphisms (SNPs) and indels. SHORE is a mapping and analysis pipeline for short DNA sequences obtained from Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX and PacBio RS platforms. The software can be adapted to handle longer reads, as well as paired-end read data.
Searches for single nucleotide polymorphisms (SNPs) with cloud computing. Crossbow is a Hadoop-based software tool that combines the speed of the short read aligner Bowtie with the accuracy of the SNP caller SOAPsnp to perform alignment and SNP detection for multiple whole-human datasets per day. The software achieves at least 98.9% accuracy on simulated datasets of individual chromosomes, and better than 99.8% concordance with the Illumina 1 M BeadChip assay of a sequenced individual.
A framework to provide a collection of rigourously validated tools for the manipulation and analysis of genome biology data sets. PyCogent is a fully integrated and thoroughly tested framework for controlling third-party applications; devising workflows; querying databases; conducting novel probabilistic analyses of biological sequence evolution; and generating publication quality graphics. It is distinguished by many unique built-in capabilities (such as true codon alignment) and the frequent addition of entirely new methods for the analysis of genomic data.
Permits analysis of high throughput sequencing (HTS) data. NGSEP is an integrated framework whose main functionality is the variants detector, allowing researchers to make integrated discovery of single nucleotide variants (SNVs), small and large indels and regions with copy number variation (CNVs). The software also provides modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics.
A pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants.
A highly scalable, ultra-fast and fully automated analysis pipeline for the discovery of genetic variation. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources.
Identifies bacterial species from sample. sRNAnalyzer is a customizable small RNA analysis pipeline for next generation sequencing (NGS) data. It uses a dataset of plasma samples from CRC patients to detect exogenous RNAs in samples. It also allows users to change the mapping order and mismatch allowance by simply changing a text-based configuration file instead of reprogramming.
Allows to analyze, compare, and visualize next generation sequencing (NGS) data. CLC Genomics Workbench offers a complete and customizable solution for genomics, transcriptomics, epigenomics, and metagenomics. The software enables to generate custom workflows, which can combine quality control steps, adapter trimming, read mapping, variant detection, and multiple filtering and annotation steps into a pipeline.
Aims to reduce the efforts put into basic data processing for next-generation sequencing (NGS). QuickNGS enables data analysis for major applications of NGS in a batch-like operation mode. This pipeline relies on the organization of available metadata in a MySQL database which is used to control the overall workflow composed of specific software applications for different kinds of analysis.
Enables users to design pipelines that manage large sets of next-generation sequencing (NGS) softwares and utilities. TOGGLE generates pipelines for largescale second- and third-generation sequencing analyses, including multi-threading support. It is designed for single nucleotide polymorphism (SNP) discovery for large sets of genomic data, ready to use in different environments (from a single machine to high performance computing (HPC) clusters).
Analyses mapped reads from diverse High-throughput sequencing (HTS) experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. Pyicos is a command line utility for the conversion and manipulation of genomic coordinates files. It facilitates HTS analysis through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Pyicos is part of the Pyicoteo suite of tools.
Automatically handles data transfer, demultiplexing conversion and quality control once a HiSeq run has been finished. Aozan is an automated post sequencing data processing pipeline. It is a non-interactive program that consists of a suite of steps that are linked together from the detection of a new run launch to the creation of the quality control report once the run has finished. Once the software is configured, biologists can manage the entire sequencing process from library preparation to FASTQ generation and QC report.
Automates the primary analysis of massive parallel sequencing data. NARWHAL is a sequence analysis pipeline that allows for the automated processing and analysis of different and multiplexed datasets obtained from Illumina sequencers and easy alignment of individual samples to the reference genome of choice using predefined alignment profiles. The software provides a coherent data analysis workflow with minimal hands-on time, a reduced chance of human error and faster analysis.
A computational framework that automates multi-omics data analysis pipelines on high performance compute clusters and in the cloud. It supports best practice published pipelines for RNA-seq, miRNA-seq, Exome-seq, Whole Genome sequencing, ChIP-seq analyses and automatic processing of data from The Cancer Genome Atlas (TCGA). Omics Pipe provides researchers with a tool for reproducible, open source and extensible next generation sequencing analysis. The goal of Omics Pipe is to democratize NGS analysis by dramatically increasing the accessibility and reproducibility of best practice computational pipelines, which will enable researchers to generate biologically meaningful and interpretable results.
An open source software designed to make it easy to do powerful sequence and comparative genomics analyses, even with extremely large multi-genome data sets. Pygr is a bioinformatics graph tool library that includes the code for interacting with sequence databases, search methods such as BLAST, repeat-masking, megablast. It offers high performance graph representation and query of interval-based data.
Allows users to numerically measure the performance of different bioinformatics pipelines, which focus on the recovery and sensitivity in single nucleotide polymorphism (SNP) calling from resequencing data. Pipeliner consists of two separate executables: (1) a command-line tool used during the analysis to convert file formats and summarize the performance of the pipeline used; and (2) a graphical user interface (GUI) used for setting up the analysis parameters and plotting results.
Investigates and handles high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools can be used to conduct alignments to reference genome, scan computations or shuffle within a reference set of regions. It serves for a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. This tool is able to reduce the memory requirements for large datasets study.
Tests if index switching is occurring in a given dataset. Index_investigator is a script that provides a way to visualize switch for sequenced genomic datasets. This method shows that in samples, index switching is minimal and not enriched in the HiSeq X. It requires samples sequenced on multiple different lanes since it relies on greater similarity for samples sequenced on the same lane, versus other lanes.
Builds reports with dynamic user interface components to represent extensive and complex analysis results. Nozzle allows researchers to construct report elements such as figures, tables, paragraphs of text, bibliographic or web references, lists, sections and subsections. It gathers functionalities to define the visibility of report elements with three privacy levels: private, group and public. This tool permits the exclusion of pertinent sections in final reports.
Provides analysis workflow and quality metric management for DNA-seq experiments. draw-sneakpeek is a Java pipeline for Next-Generation Sequencing (NGS) data analysis. It was used to process whole-genome sequencing (WGS), whole-exome sequencing (WES) and targeted sequencing experiments on traditional high-performance computing clusters as well as on Amazon elastic compute cloud (EC2). This method is also available as Amazon machine images.
Permits quality control of Next-Generation-Sequencing (NGS) tumor-normal experiments. NGS-Bits is separate into four steps: (1) gather information from raw reads, (2) map reads, (3) extract variant lists, and (4) combine result from precedent steps to then add quality control (QC) metrics for tumor-normal experiments. This tool includes all stages of single-sample NGS data analysis and adds special QC metrics for DNA sequencing of tumor-normal pairs.
Allows to develop tailored workflows for the analysis of whole-exome (WES), whole-genome (WGS), and transcriptome (RNA-seq) sequencing data. NGS-pipe is an automated framework for the design of pipelines for the analysis of large-scale sequencing data, such as cancer genomics data. It provides building blocks to execute state-of-the-art tools, as well as appropriate error handling. It also overcome the common lack of automated procedures to ensure reproducibility.
Allows automated installation of bioinformatics tools in computational environments. Dugong is a comprehensive Docker-based virtual desktop that integrates more than 3500 bioinformatics softwares using an XFCE4 Graphic User Interface (GUI). The desktop incorporates software’s available from three major open-source bioinformatics repositories: BioConda, LinuxBrew and BioLinux Repository. It also incorporates Jupyter Notebook and can be integrated into Dockstore pipelines by Common Workflow Language (CWL) and used to visualize graphical results from different types of analysis.
Allows users to simultaneously analyze all samples or to apply population priors to a single sample. RTG Population can use information from large sample sizes in the following ways: joint population, n+1 analysis, and pedigree aware. It processes by maximizing the utility of available prior information to improve accuracy, and by utilizing pedigree for an automatic highlighting of de novo variants. This tool is part of RTG Core.
A pipeline for sequencing-based direct gene cloning. GIPS considers a range of experimental and data analysis choices in a sequencing-based forward genetics study to design an optimal procedure for gene identification. GIPS estimates four measurements to help optimize an analysis procedure as follows: (i) the chance of reporting the true phenotype-associated gene, (ii) the expected number of random genes that may be reported, (iii) the significance of each candidate gene’s association with the phenotype, and (iv) the significance of violating the Mendelian assumption if no gene is reported or if all candidate genes have failed validation.
Serves as a scalable composition tool for implementing asynchronous Boolean logic within a multi-cellular consortium. CALIN consists of two main steps: it first decomposes the Boolean function of interest into a disjunction of conjunctive clauses via the Quine-McCluskey algorithm. The software then converts each clause into a given computational device for which particular connections are created between integrases and inputs.
Detects, summarizes, visualizes, and interprets the various types of somatic and germline events encountered in variant analysis. GMS makes open, high-throughput genome analysis available to groups currently tasked to analyze the deluge of data from high-throughput sequencing (HTS) experiments. It also includes an integration, annotation, and interpretation pipeline, ‘MedSeq’, which attempts to converge all single-subject data into a form suitable for identification of clinically actionable events.
Contains features for studying whole genome sequencing (WGS) and whole exome sequencing (WES) data. DNAp is a program able to detect mutations from diseases samples and can be applied to work on human and mouse samples. For performing, this tool requires that users provide several information such as the word “tumor”, or the word “normal”.
Consists of a collection of investigation approaches and displays software for microarray data. Chipster is useful for several types of high throughput data such as microarrays, proteomics and next generation sequencing (NGS). It can be employed to normalize most of the commonly used chip types and permits to utilize the remapped information. This tool is useful for RNA degradation, relative log expression (RLE), normalized unscaled standard error (NUSE) or quality control probe expression.
Enables genotyping and variant annotation of resequencing data produced by second generation next generation sequencing (NGS) technologies. CoVaCS is an automated system that provides tools for variant calling and annotation along with a pipeline for the analysis of whole genome shotgun (WGS), whole exome sequencing (WES) and targeted resequencing data (TGS). The software allows non-specialists to perform all steps from quality trimming to variant annotation.
A bioinformatic pipeline created to work on next generation sequence (NGS) analysis as well as with Sanger sequences. It is capable of cleaning reads, prepare a novo assembly, map reads against a reference, look for SNPs and SSRs, and do some function annotation like: ORFs, GO terms and sequence descriptions.
Allows analysis of next-generation sequencing datasets. Kraken is a suite of tools that encompasses adapter contaminant detection and removal, and cleaning of low-complexity and low-quality sequence tracts and other contaminants arising from sample preparation. The software can handle multiplexed data, recognize common read geometries, produce reports, plots and summary statistics, and process several FASTQ files in parallel.
Allows analyses of high-throughput small RNA (sRNA) sequence data in model and non-model plants, from raw data to identified and annotated conserved and novel sequences. miRPursuit is a pipeline performing a series of sRNA analyses. The software minimizes the need to perform manual repetitive tasks allowing to run several libraries in parallel, for comparing differences in sRNA read accumulation among sRNA libraries. It can directly analyze the sRNA sequencing raw data from any sequencer.
Assists users in characterizing microbial genomic reference materials. PEPR is a pipeline that provides a reproducible workflow intending to characterize prokaryotic genomic materials. This method can be used to qualify as well as reference materials as in-house quality control materials for which replicate sequencing datasets from multiple platforms are available.
Analyzes raw sequencing data from several next generation sequencing (NGS) platforms. MutAid is a pipeline performing six different steps: (i) quality control and filtering; (ii) mapping reads to reference genome; (iii) variant detection, effect prediction and cross-referencing and lastly (iv) and then produces a summary of all information generated. It can be used to interpret mutational variants from various data generated by targeted gene-panel sequencing or whole genome sequencing.
A parallel, automated and rapid quality control pipeline for analyzing the next generation sequencing big data of Illumina platform. NGS-QCbox that scales up to process hundreds or thousands of samples. NGS-QCbox integrates Raspberry with other open-source tools for alignment (Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS data at higher efficiency and in high-throughput manner. It reports read and base statistics along with genome coverage and variants in a user friendly format. The pipeline developed presents a simple menu driven interface and can be used in either quick or complete mode. In addition, the pipeline in quick mode outperforms in speed against other similar existing QC pipeline/tools.
Analyses next generation sequencing (NGS) data. ANGSD can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. The tool is especially useful for low and medium depth data. The tool allows the user to choose between combinations of existing methods and can perform analysis that is not implemented elsewhere.
Provides two multi-objective extensions of constraint-based models (CBMs) for modeling communities: multi-objective flux balance analysis (MO-FBA) and multi-objective flux variability analysis (MO-FVA). Both MO-FBA-FVA pipeline methods were applied to a hot spring mat model ecosystem. As a result, multiple trade-offs between nutrients and growth rates, as well as thermodynamically favorable relative abundances at community level, were emphasized. This approach can be used for integrating genomic information in microbial ecosystems. These models will provide insights about behaviors (including diversity) that take place at the ecosystem scale.
Provides a bioinformatics system for the integrated analysis of microRNA and messenger RNA (mRNA) data. BioVLAB is a cloud workbench that requires users to only perform two tasks: (1) create a virtual machine and then (2) download and execute a pre-composed workflow consisting of their own data. The software can be useful for small research labs that do not have access to high performance computing infrastructures and resources to hire full time bioinformatics and computer system specialists.
Generates annotated personal genome suitable for further analysis and clinical interpretation. Mercury is an extensible workflow which comprises a library of multiple sequence analysis components including variant calling and annotation as well as a set of validation tools, with the aim of encompassing the complete field of the next generation sequencing (NGS) data analysis. Additionally, this software provides a cloud-based implementation to make information sharable between multiple users.
Provides automated analysis pipelines for microbial genomics via a portable virtual machine (VM). CloVR aims to perform sophisticated analyses by using local resources and cloud computing platforms if additional resources are required for increased throughput. This software aims to reduce bottlenecks in sequence analysis by combining VMs and cloud computing and to simplify deployments of complex bioinformatics workflows by providing a single executable.
Allows users to analyze DNA-Seq data from next-generation sequenting (NGS) equipment such as Illumina, Roche/454, Proton, and Ion Torrent. GensearchNGS aids researchers to create different projects to group the analysis of various patients based on their type or their relation. It focuses on patients and their associated metainformation and permits, for each project, to have several raw sequencing data associated.
Automates the processing and analysis of several commonly used Next Generation Sequencing (NGS) datasets including: ChIP-seq, RNA-seq, Global Run On sequencing (GRO-seq), micrococcal nuclease footprint sequencing (MNase-seq), DNase hypersensitivity sequencing (DNase-seq), and transposase-accessible chromatin using sequencing ATAC-seq datasets. CIPHER provides an analysis mode that accomplishes complex bioinformatics tasks such as enhancer prediction. It supplies functions to integrate various NGS datasets together.
Analyzes low-input carrier sequencing data sequencing data and detects target reads. CarrierSeq is a sequence analysis workflow that can process the identification of target reads from a low-input sequencing run exploiting a genomic carrier. This software is based on a Poisson distribution of reads/channel to pinpoint channels producing spurious reads. Users can define their own threshold based on their libraries quality control metrics to determine a quality score filter.
Serves for next generation sequencing (NGS) secondary analysis. BALSA is an extension of the GPU-based aligner SOAP3-dp that can realize the analysis of whole genome sequencing (WGS) data. It performs the whole secondary analysis including quality control, alignment, base score recalibration, de-duplication and realignment in memory on the fly.
Allows users to process targeted enriched Illumina sequences from raw reads to alignments. SECAPR serves to guide users from raw sequencing results to cleaned and filtered multiple sequence alignments (MSAs) for phylogenetic and phylogeographic analyses. This tool provides functions permitting researchers to choose appropriate settings for their specific datasets.