1 - 39 of 39 results

GATK / Genome Analysis ToolKit

star_border star_border star_border star_border star_border
star star star star star
Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and ables to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.


star_border star_border star_border star_border star_border
star star star star star
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.


star_border star_border star_border star_border star_border
star star star star star
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.


star_border star_border star_border star_border star_border
star star star star star
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).


A statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data. BCFtools can manipulate variant calls in the variant call format (VCF) and its binary counterpart BCF. It also can discover somatic and germline mutations with appropriate input data, efficiently estimate site allele frequency, allele frequency spectrum and linkage disequilibrium, and test Hardy–Weinberg equilibrium and association.


Filters spurious variants caused by mouse reads in patient-derived xenografts (PDXs) and caused by paralogous sequences in primary tumors. Mapexr is an R package that implements MAPEX (the Mouse And Paralog EXterminator), a BLASTN-based algorithm for filtering variants. This algorithm is designed to fit into a standard tumor variant calling pipeline and flag variants which may arise from mis-alignment of mouse reads or from paralogous sequences. The software can be a useful component for many tumor variant-calling pipelines.


A suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. With modules that operate from FASTQ pre-processing through BAM post-processing and RPKM calculations, NGSUtils compliments existing tools and provides unique functionality that helps each step of an NGS data analysis pipeline. NGSUtils covers different aspects of NGS data analysis, including pre-processing, post-processing, filtering, format conversion and final result calculations. NGSUtils provides a stable and modular platform for data management and analysis.


A pipeline to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. MutAid performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid can be used to analyze, elucidate and interpret mutational variants from data generated by targeted re-sequencing, gene-panel sequencing, exome, and whole genome sequencing.


A C ++ read filtering and profiling tool for use with BAM, CRAM and SAM sequencing files. VariantBam provides a flexible framework for extracting sequencing reads or read-pairs that satisfy combinations of rules, defined by any number of genomic intervals or variant sites. We have implemented filters based on alignment data, sequence motifs, regional coverage and base quality. VariantBam enables efficient storage of sequencing data while preserving the most relevant information for downstream analysis. It is easy to compile and run, and is extensively documented with a number of use cases and examples.


Simplifies variant annotation and filtering. Bystro is able to handle sequencing experiments on the scale of thousands of whole-genome samples and tens of millions of variants online in a web browser. It integrates search engine for filtering variants and samples from these experiments, and it enables real-time (sub-second), nuanced variant filtering, both across all samples and per sample, using simple phrases and interactive, web-based filters. It assists users to find alleles of interest in any sequencing experiment.

SWEEP / Sliding Window Extraction of Explicit Polymorphisms

Filters out false positives from a set of single nucleotide polymorphism (SNP) calls. SWEEP uses the ubiquitous false-positive SNP calls and transforms them from a weakness to a strength by using their information to pull out the true SNPs that are polymorphic between genotypes of interest. User only needs to supply sorted and indexed bam files and the reference genome used to map sequence reads. SWEEP is also applicable for other allopolyploid crops.


Implements a flexible command-line toolkit providing specific support to the management, filtering, comparison and annotation of genomic position (GP) files produced by next generation sequencing (NGS) experiments. PileLine consists of a set of command-line utilities that are easy to integrate in custom workflows or user-friendly frameworks like Galaxy. The tools comprising PileLine are focussed on two different but complementary activities: (i) processing and annotation, implementing simple but reusable operations over input GP files and (ii) analysis, giving support to more advanced and specific requirements. PileLine contains 10 command-line utilities that have been designed to be memory efficient by performing on-disk operations over sorted GP files.

VarAFT / Variant Annotation and Filter Tool

Annotates and filtrates variant files. VarAFT allows the comparison of several individuals and the collection of relevant information about the variations. It includes a coverage analysis module to easily visualize regions that are poorly covered though tables and dynamic charts. With VarAFT, users can annote variant (VCF) files, combine multiple samples from various individuals, prioritize list of variants by multi-filtering parameters. Additionnaly, users can perform a coverage analysis and quality check from any BAM file.


Detects and filters the misaligned reads of SAM format. Such filtration can reduce false positives in alignment and the following variant analysis. Cross-validation between two simulated datasets processed with SAMSVM yielded accuracies that ranged from 0.89 to 0.97 with F-scores ranging from 0.77 to 0.94 in 14 groups characterized by different mutation rates from 0.001 to 0.1, indicating that the model built using SAMSVM was accurate in misalignment detection. Application of SAMSVM to actual sequencing data resulted in filtration of misaligned reads and correction of variant calling.


A quality control script that makes FASTA format files compatible for a variety of downstream bioinformatics tools. Fasta-O-Matic automates handling of common but minor format issues that otherwise may halt pipelines. The need for automation must be balanced by the need for manual confirmation that any formatting error is actually minor rather than indicative of a corrupt data file. To that end Fasta-O-Matic reports any issues detected to the user with optionally color coded and quiet or verbose logs. Fasta-O-Matic can be used as a general pre-processing tool in bioinformatics workflows (e.g. to automatically wrap FASTA files so that they can be read by BioPerl). It was also developed as a sanity check for bioinformatic core facilities that tend to repeat common analysis steps on FASTA files received from disparate sources. Fasta-O-Matic can be set with format requirements specific to downstream tools as a first step in a larger analysis workflow.


star_border star_border star_border star_border star_border
star star star star star
A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information). The main processing of such FASTA/FASTQ files is mapping (aka aligning) the sequences to reference genomes or other databases using specialized programs. Example of such mapping programs are: Blat, SHRiMP, LastZ, MAQ and many many others.


Removes non-alphabetic (e.g. gap) characters from sequences. degapseq reads one or more sequences and writes them out again but stripped of any non-alphabetic characters. It main purpose is to remove gap characters from aligned sequences, but it will also remove such things as the symbol for translation STOP in a protein sequence. There are many different formats for storing molecular sequences in files. Some formats are specifically for aligned sequences, where gaps are inserted into the sequences for purposes of alignment.


A quick and extremely permissive method to read and write VCF files. vcflib provides a variety of functions for VCF manipulation: comparison, format conversion, filtering and subsetting, annotation, samples, ordering, variant representation, genotype manipulation, interpretation and classification of variants. Piping provides a convenient method to interface with other libraries (vcf-tools, BedTools, GATK, htslib, bcftools, freebayes) which interface via VCF files, allowing the composition of an immense variety of processing functions.


Converts sequence files between different formats such as fastq and fasta. Reformat is designed for generic streaming read-processing tasks that have low memory or computational demands, such as format conversion, subsampling, and various filtering operations. This package needs only a trivial amount of memory for processing short reads, regardless of how many there are. Some of its functionality (like quality-trimming, length-filtering, histogram generation) is shared with BBDuk.