1 - 50 of 64 results

GATK / Genome Analysis ToolKit

star_border star_border star_border star_border star_border
star star star star star
(1)
Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and ables to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.

SAMtools

star_border star_border star_border star_border star_border
star star star star star
(5)
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.

GATK-Queue / Genome Analysis Toolkit-Queue

A command-line scripting framework for defining multi-stage genomic analysis pipelines combined with an execution manager that runs those pipelines from end-to-end. Often processing genome data includes several steps to produces outputs, for example our BAM to VCF calling pipeline include among other things: local realignment around indels; emitting raw SNP calls; emitting indels, masking the SNPs at indels; annotating SNPs using chip data; labeling suspicious calls based on filters; creating a summary report with statistics. Running these tools one by one in series may often take weeks for processing, or would require custom scripting to try and optimize using parallel resources. With a Queue script users can semantically define the multiple steps of the pipeline and then hand off the logistics of running the pipeline to completion. Queue runs independent jobs in parallel, handles transient errors, and uses various techniques such as running multiple copies of the same program on different portions of the genome to produce outputs faster.

VarScan

A platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. It can be used to detect different types of variation: 1) germline variants (SNPs and indels) in individual samples or pools of samples, 2) multi-sample variants (shared or private) in multi-sample datasets (with mpileup), 3) somatic mutations, LOH events, and germline variants in tumor-normal pairs and 4) somatic copy number alterations (CNAs) in tumor-normal exome data.

Nanopolish

Provides a nanopore consensus algorithm using a signal-level hidden Markov model (HMM). The main subprograms of Nanopolish are: (i) nanopolish extract which extracts reads in FASTA or FASTQ format from a directory of FAST5 files; (ii) nanopolish eventalign which aligns signal-level events to k-mers of a reference genome; (iii) nanopolish variants which detects single nucleotide polymorphisms (SNPs) and indels with respect to a reference genome; and (iv) nanopolish variants –consensus which calculates an improved consensus sequence for a draft genome assembly. Furthermore, Nanopolish contains an experimental option that will use event durations to improve the consensus accuracy around homopolymers.

Scalpel

A software package for detecting INDELs (INsertions and DELetions) mutations in a reference genome which has been sequenced with next-generation sequencing technology (e.g., Illumina). Scalpel is designed to perform localized micro-assembly of specific regions of interest with the goal of detecting mutations with high accuracy and increased power. Scalpel supports three modes of operation: single, de novo, and somatic. In the single mode, Scalpel detects indels in one single dataset (e.g., one individual exome). In the de novo mode, Scalpel detects de novo indels in a quad family (father, mother, affected child, unaffected sibling). In the somatic mode, Scalpel detects somatic indels from the sequencing data coming from matched tumor and normal samples.

Octopus-toolkit

New
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.

Dindel

A Bayesian method to call indels from short-read sequence data in individuals and populations by realigning reads to candidate haplotypes that represent alternative sequence to the reference. The candidate haplotypes are formed by combining candidate indels and SNVs identified by the read mapper, while allowing for known sequence variants or candidates from other methods to be included. In our probabilistic realignment model we account for base-calling errors, mapping errors, and also, importantly, for increased sequencing error indel rates in long homopolymer runs.

Strelka

Provides analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. Strelka is a variant calling method building upon the innovative Strelka somatic variant caller to improve upon aspects of variant calling for both germline and somatic analysis. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a mixture-model indel error estimation method to improve robustness to indel noise.

MATE-CLEVER / Mendelian-inheritance-AtTEntive CLique-Enumerating Variant finder

An approach that accurately discovers and genotypes indels longer than 30 bp from contemporary NGS reads with a special focus on family data. For enhanced quality of indel calls in family trios or quartets, MATE-CLEVER integrates statistics that reflect the laws of Mendelian inheritance. MATE-CLEVER's performance rates for indels longer than 30 bp are on a par with those of the GATK for indels shorter than 30 bp, achieving up to 90% precision overall, with >80% of calls correctly typed. In predicting de novo indels longer than 30 bp in family contexts, MATE-CLEVER even raises the standards of the GATK.

HySA / Hybrid Structural variant Assembly

Integrates sequencing reads from next-generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome.

AMF / AgroMarker Finder

A GUI software for providing graphical user interface (GUI) to facilitate the recently developed restriction-site associated DNA (RAD) sequencing data analysis in rice. AMF integrates sophisticated tools with self-developed algorithms that can help users finish data analysis with simple operation. It consists of five independent modules: FilterAndMapping, BamConvert, NPInDel, DetectionAndAnnotation, SomaticDetection and VariantLocation. Based on this software, large volumes of polymorphism data have been discovered and analyzed, which will be meaningful for further application, such as genetic mapping, genetic map construction, evolutionary studies and marker-assisted seletion.

SomaticSeq

An accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses.

INDELseek

Detects complex indels from next-generation sequencing (NGS) reads. INDELseek was demonstrated as an accurate and versatile complex indel caller, which is compatible with somatic and germline genomics studies, NGS data of random fragments and polymerase chain reaction (PCR) amplicons, and all three classes of complex indels (MNV, net insertion and net deletion). Since INDELseek was implemented as a single Perl script that directly reads SAM/BAM alignments and returns complex indel calls in VCF format, it can be readily incorporated into common bioinformatics workflows without any compilation and installation. INDELseek complements other common variant callers in academic and diagnostic NGS-based genomics studies.

BreakSeek

A breakpoint-based algorithm, which can unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations. Comprehensive evaluations on both simulated and real datasets revealed that BreakSeek outperformed other existing methods on both sensitivity and specificity in detecting both small and large INDELs, and uncovered a significant amount of novel INDELs that were missed before.

Manta

Calls structural variants (SVs) and indels from mapped paired-end sequencing reads. Manta is optimized for analysis of individuals and tumor/normal sample pairs, calling SVs, medium-sized indels and large insertions within a single workflow. The method is designed for rapid analysis on standard computer hardware: NA12878 at 50x genomic coverage is analyzed in less than 20 minutes on a 20 core server, most WGS tumor-normal analyses can be completed within 2 hours. Manta combines paired and split-read evidence during SV discovery and scoring to improve accuracy, but does not require split-reads or successful breakpoint assemblies to report a variant in cases where there is strong evidence otherwise. It provides scoring models for germline variants in individual diploid samples and somatic variants in matched tumor-normal sample pairs.

piCALL

Allows population indel detection and genotyping. piCALL detects and allows genotyping of small indels from population-scale sequence data. The software is compatible with data from different sequencing platforms but requires all samples to be sequenced using the same sequencing platform. The method requires sequence data from a sufficient number of samples to accurately estimate the population genotypes. Its performance was assessed using population sequencing data generated by the 1000 Genomes project (exon sequencing).

PyroHMMvar

Calls single nucleotide polymorphisms (SNPs) and short indels for both Ion Torrent and 454 resequencing data. PyroHMMvar is a method that has two distinct features: (i) an HMM to formulate homopolymer errors and which can distinguish real signals from sequencing errors and thus improve the alignment of reads against the reference and (ii) a graph data structure that merges multiple aligned reads at a given locus into a weighted alignment graph. PyroHMMvar is also available as part of the toolkit PyroTools.

SV-M / Structural Variant Machine

Detects indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. SV-M is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. The key benefit of using a discriminative model is to learn to distinguish between true and false candidates based on a Sanger validated ground truth, thereby reducing the false positive rate among predicted indels.

GVC / Genomic Variant Caller

Obsolete
Detects various genomic variants including single nucleotide variant (SNV), single insertion/deletion (sINDEL) and structural variation (SV) from personal and normal-cancer paired whole-genome/exome sequencing data. GVC is an all-round genomic variant caller. This resource (i) aligns paired-end FASTQ files to the human GRCh37 reference genome with BWA-MEM and (ii) extracts feature information from Binary Alignment Map (BAM) file to assemble feature vector space.

16GT

Provides a variant caller for Illumina whole genome sequencing (WGS) and whole exome sequencing (WES) germline data. 16GT uses a new 16-genotype probabilistic model to unify single nucleotide polymorphism (SNP) and indel calling in a single variant calling algorithm. In benchmark comparisons with five other widely used variant callers on a modern 36-core server, 16GT ran faster and demonstrated improved sensitivity in calling SNPs, and it provided comparable sensitivity and accuracy in calling indels as compared to the GATK HaplotypeCaller.

SvABA / Structural variation and indel Analysis By Assembly

Detects structural variants (SVs) from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. SvABA’s performance was evaluated on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improved detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. SvABA was applied to 344 cancer genomes from 11 cancer types, and found that templated-sequence insertions occur in approximately 4% of all somatic rearrangements. Finally, SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized SVs.

UPS-indel / Universal Positioning System of indels

Creates a universal positioning system for insertions/deletions (indels) and allows to compare indel calling results produced by different tools. UPS-indel is a universal positioning system for indels, whereby every indel variant is represented by a range of positions within which all equivalent indels can occur. This representation is added to the VCF file resulting in a UVCF file containing not only the original indel calling results, but also the complete representation of all equivalent indels.

FermiKit

A variant calling pipeline for Illumina whole-genome germline data. It de novo assembles short reads and then maps the assembly against a reference genome to call SNPs, short insertions/deletions (INDELs) and structural variations (SVs). FermiKit takes about one day to assemble 30-fold human whole-genome data on a modern 16-core server with 85GB RAM at the peak, and calls variants in half an hour to an accuracy comparable to the current practice. FermiKit assembly is a reduced representation of raw data while retaining most of the original information.

ScanIndel

Detects indels with multiple heuristics including gapped alignment, split reads and de novo assembly. Using simulation data, it was demonstrated ScanIndel’s superior sensitivity and specificity relative to several state-of-the-art indel callers across various coverage levels and indel sizes. ScanIndel yields higher predictive accuracy with lower computational cost compared with existing tools for both targeted resequencing data from tumor specimens and high coverage whole-genome sequencing (WGS) data from the human NIST standard NA12878. ScanIndel will improve indel analysis in both clinical and research settings. ScanIndel is implemented in Python, and is freely available for academic use.

SV-AUTOPILOT / Structural Variation AUTOmated PIpeLine Optimization Tool

Obsolete
Standardizes the Structural Variation (SV) detection pipeline. SV-AUTOPILOT is a pipeline that can be used on existing computing infrastructure in the form of a Virtual Machine (VM) Image. It provides a “meta-tool” platform for using multiple SV-tools, to standardize benchmarking of tools, and to provide an easy, out-of-the-box SV detection program. In addition, the user can choose which of several alignment algorithms is used in their analysis.

GINDEL

An approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. It performs well for insertion genotyping on both simulated and real data. GINDEL can not only call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches.

BBCAnalyzer

Allows the visualization of the relative or absolute number of bases, deletions and insertions at defined positions in sequence alignment data available as bam files in comparison to the reference bases. BBCAnalyzer consists of different steps that are highly dependent on each others output, which is why it is necessary to run the function analyzeBases, performing the whole process of analyzing the data and visualizing the results, at first. The runtime of the tool depends on the number of samples and the number of positions that get analyzed.