Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and able to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
The extension of mapped sequence tags is a common step in the analysis of single-end next-generation sequencing (NGS) data from protein localization and chromatin studies. The optimal extension can vary depending on experimental and technical conditions. Improper extension of sequence tags can obscure or mislead the interpretation of NGS results. ArchTEx identifies the optimal extension of sequence tags based on the maximum correlation between forward and reverse tags and extracts and visualizes sites of interest using the predicted extension.
Permits quality control of Next-Generation-Sequencing (NGS) tumor-normal experiments. NGS-Bits is separate into four steps: (1) gather information from raw reads, (2) map reads, (3) extract variant lists, and (4) combine result from precedent steps to then add quality control (QC) metrics for tumor-normal experiments. This tool includes all stages of single-sample NGS data analysis and adds special QC metrics for DNA sequencing of tumor-normal pairs.
Allows users to reformate and filter bioinformatics files. JVARKIT aims to simplify the grammar employed to filter bioinformatic file, for rendering possible to write a loop or a custom function. JVARKIT is a set of more than 100 java-based tools for bioinformatics.
Facilitates the design, optimization, and tracking of barcoded oligonucleotides. XSTK is useful for projects that require highly multiplexed polymerase chain reaction (PCR) and DNA sequencing. It builds a list of all possible DNA sequences of a specified length and then progressively culls sequences that may interfere with primary PCR amplification and/or sequencing steps.
Resamples biomolecular sequence data. SERES is composed of a merging of non- and semi-parametric approaches with the aim of assisting users in support assessment. It can be applied to both aligned and unaligned sequences. The application gathers a bootstrap method to a Heads-or-Tails (HoT) technique that allows users to run both empirical or simulation studies on SERES local tree annotation.
Provides several programs allowing users to perform both common and uncommon tasks with FASTQ files. fastq-tools is a toolkit that provides tools for (1) finding reads matching a regular-expression, (2) counting k-mer occurances, (3) performing local alignment against every FASTQ sequence, (4) sample reads with or without replacement, (5) sorting FASTQ files and (6) filtering reads with identical sequences.
Enables users to process sequences in FASTA and FASTQ formats. Seqtk parses both FASTA and FASTQ files which can also be optionally compressed by gzip. This application also permits to convert ILLUMINA files to FASTA and to mask bases with low quality. In addition, it includes an option to extract sequences with a specific name or in specified regions.
Allows users to interact with files associated with next-generation sequencing (NGS). qMule is composed of three modules: Aligner Compare confronts 2 BAMs aligned from the same FASTQ and separates out reads that are different between the BAMs; BamMismatchCounts provides a tally of how many mismatches were in each read for reads that mapped full-length; and MafFilter that searches for QCMG-annotated MAF files.
A quick and extremely permissive method to read and write VCF files. vcflib provides a variety of functions for VCF manipulation: comparison, format conversion, filtering and subsetting, annotation, samples, ordering, variant representation, genotype manipulation, interpretation and classification of variants. Piping provides a convenient method to interface with other libraries (vcf-tools, BedTools, GATK, htslib, bcftools, freebayes) which interface via VCF files, allowing the composition of an immense variety of processing functions.
Allows user to make maximum usage of the processing power available to their machines. Parabam is a python package that allows user to inspect large BAM files in a timely manner whilst making the most of their computational resources and without having to write too much code. It can be invoked programmatically via interface classes, via the command line and also support full incorporation via object-oriented programming (OOP) inheritance.
Helps about the Genome Wide Association (GWA) studies problem. Vcfsubsample subsamples the data in order to "lock" the minor allele frequency (MAF) in the data set, i.e. all single nucleotide polymorphisms (SNPs) will have the same MAF.
Converts sequence files between different formats such as fastq and fasta. Reformat is designed for generic streaming read-processing tasks that have low memory or computational demands, such as format conversion, subsampling, and various filtering operations. This package needs only a trivial amount of memory for processing short reads, regardless of how many there are. Some of its functionality (like quality-trimming, length-filtering, histogram generation) is shared with BBDuk.