Shouts for genetic variants in next generation DNA sequencing data. DeepVariant employs a deep neural network method to proceed. It uses one hundred million training examples to allocate genotype likelihoods from the experimental data given. This tool can be useful for determining gene function and activity, the nature of a patient’s medical condition, and equally for prediction of development risk for a disease.
Offers a way to manage pipelines. Toil supports arbitrary worker and leader failure, with strong check-pointing that allows resumption. It can be employed to run scientific workflows on a large scale in cloud or high-performance computing (HPC) environments. This tool was used to compute gene- and isoform- level expression values for 19 952 samples from four studies.
An open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
Allows automated analysis and annotation of complex -omics data. O-Miner is a solution for the analysis and exploitation of data. The software is composed of two analytical domains, genomics and transcriptomics, and a third analytical layer for the analysis of data from methylation arrays. Several established prostate cancer (PCa) biomarkers were identified using O-miner. It O-miner provides researchers with the tools required to conduct powerful analyses of publicly available sequencing data.
Permits alignment of reads and prediction of single nucleotide polymorphisms (SNPs) and indels. SHORE is a mapping and analysis pipeline for short DNA sequences obtained from Illumina Genome Analyzer and Hiseq 2000, Life Technology SOLiD, 454 Genome Sequencer FLX and PacBio RS platforms. The software can be adapted to handle longer reads, as well as paired-end read data.
Searches for single nucleotide polymorphisms (SNPs) with cloud computing. Crossbow is a Hadoop-based software tool that combines the speed of the short read aligner Bowtie with the accuracy of the SNP caller SOAPsnp to perform alignment and SNP detection for multiple whole-human datasets per day. The software achieves at least 98.9% accuracy on simulated datasets of individual chromosomes, and better than 99.8% concordance with the Illumina 1 M BeadChip assay of a sequenced individual.
A framework to provide a collection of rigourously validated tools for the manipulation and analysis of genome biology data sets. PyCogent is a fully integrated and thoroughly tested framework for controlling third-party applications; devising workflows; querying databases; conducting novel probabilistic analyses of biological sequence evolution; and generating publication quality graphics. It is distinguished by many unique built-in capabilities (such as true codon alignment) and the frequent addition of entirely new methods for the analysis of genomic data.
Permits analysis of high throughput sequencing (HTS) data. NGSEP is an integrated framework whose main functionality is the variants detector, allowing researchers to make integrated discovery of single nucleotide variants (SNVs), small and large indels and regions with copy number variation (CNVs). The software also provides modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics.
A pipeline for efficiently detecting and genotyping high-quality variants from large-scale sequencing data. GotCloud automates sequence alignment, sample-level quality control, variant calling, filtering of likely artifacts using machine-learning techniques, and genotype refinement using haplotype information. The pipeline can process thousands of samples in parallel and requires less computational resources than current alternatives. Experiments with whole-genome and exome-targeted sequence data generated by the 1000 Genomes Project show that the pipeline provides effective filtering against false positive variants and high power to detect true variants.
A highly scalable, ultra-fast and fully automated analysis pipeline for the discovery of genetic variation. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources.
Performs variant discovery on Amazon's Web Service (AWS) cloud or on local high-performance computing clusters. GenomeVIP is a genomics analysis pipeline for cloud computing with germline and somatic calling on amazon’s cloud. It provides a collection of analysis tools and computational frameworks for streamlined discovery and interpretation of genetic variants. The server and runtime environments can be customized, updated, or extended.
Identifies bacterial species from sample. sRNAnalyzer is a customizable small RNA analysis pipeline for next generation sequencing (NGS) data. It uses a dataset of plasma samples from CRC patients to detect exogenous RNAs in samples. It also allows users to change the mapping order and mismatch allowance by simply changing a text-based configuration file instead of reprogramming.
Allows to analyze, compare, and visualize next generation sequencing (NGS) data. CLC Genomics Workbench offers a complete and customizable solution for genomics, transcriptomics, epigenomics, and metagenomics. The software enables to generate custom workflows, which can combine quality control steps, adapter trimming, read mapping, variant detection, and multiple filtering and annotation steps into a pipeline.
Aims to reduce the efforts put into basic data processing for next-generation sequencing (NGS). QuickNGS enables data analysis for major applications of NGS in a batch-like operation mode. This pipeline relies on the organization of available metadata in a MySQL database which is used to control the overall workflow composed of specific software applications for different kinds of analysis.
Enables users to design pipelines that manage large sets of next-generation sequencing (NGS) softwares and utilities. TOGGLE generates pipelines for largescale second- and third-generation sequencing analyses, including multi-threading support. It is designed for single nucleotide polymorphism (SNP) discovery for large sets of genomic data, ready to use in different environments (from a single machine to high performance computing (HPC) clusters).
A variant calling pipeline for Illumina whole-genome germline data. It de novo assembles short reads and then maps the assembly against a reference genome to call SNPs, short insertions/deletions (INDELs) and structural variations (SVs). FermiKit takes about one day to assemble 30-fold human whole-genome data on a modern 16-core server with 85GB RAM at the peak, and calls variants in half an hour to an accuracy comparable to the current practice. FermiKit assembly is a reduced representation of raw data while retaining most of the original information.
Investigates and handles high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools can be used to conduct alignments to reference genome, scan computations or shuffle within a reference set of regions. It serves for a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. This tool is able to reduce the memory requirements for large datasets study.
Analyses mapped reads from diverse High-throughput sequencing (HTS) experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. Pyicos is a command line utility for the conversion and manipulation of genomic coordinates files. It facilitates HTS analysis through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Pyicos is part of the Pyicoteo suite of tools.
An open-access software package for processing and analysing sequence reads from time-resolved data, calling important single- and multi-locus variants over time, identifying alleles potentially affected by selection, calculating linkage disequilibrium statistics, performing haplotype reconstruction, and exploiting time-resolved information to estimate the extent of uncertainty in reported genomic data.