Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and able to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
Assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Cufflinks assembles individual transcripts from RNA-seq reads that have been aligned to the genome. This software is able to infer the splicing structure of each gene because reads from multiple splice variants for a given gene can be found in a sample. Quantification of transcript abundances is also possible by preferring a reference annotation to assembling the reads.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.
A high performance robust tool and library for working with SAM, BAM and CRAM sequence alignment files; the most common file formats for aligned next generation sequencing (NGS) data. Sambamba is a faster alternative to samtools that exploits multi-core processing and dramatically reduces processing time. Sambamba is being adopted at sequencing centers, not only because of its speed, but also because of additional functionality, including coverage analysis and powerful filtering capability.
A flexible and easy to use interface that programmers of many levels of experience can use to access information in the popular and common SAM/BAM format. bio-samtools 2 provides new classes for describing genomic regions and genetic variants, allows the easy addition of newly developed SAMtools features and can produce publication-quality visualizations of data with minimal effort by the coder.
Handles multiple sequences and alignments in batch mode. FasParser provides a platform able to perform several common tasks such as: (i) batch performing alignment building; (ii) concatenating, merging, extracting and filtering of sequences, (iii) alignment format conversion; (iv) designing polymerase chain reaction (PCR) primers, and more. Additionally, the application supplies an editor dedicated to the visualization and the editing of the analyzed sequences.
Combines structural variation (SV) calls from different algorithms to reduce false positives and maximize discovery. FusorSV is based on a data-mining approach that characterizes the performance of a group of SV callers against a given truth set. It can choose all possible combinations of callers that satisfy a performance threshold. The goal of this tool is to be extensible, evolving with the field.
Permits users to parse, analyze and manipulate VCF files. VCFtools is a software package for composed of two modules: the first is a general API that allows various operations to be performed on VCF files, including format validation, merging, comparing, intersecting, making complements and basic overall statistics; the second module analyze single-nucleotide polymorphism (SNP) data in VCF format, assisting researchers to estimate allele frequencies, levels of linkage disequilibrium and various quality control (QC) metrics.
Annotates and filtrates variant files. VarAFT allows the comparison of several individuals and the collection of relevant information about the variations. It includes a coverage analysis module to easily visualize regions that are poorly covered though tables and dynamic charts. With VarAFT, users can annote variant (VCF) files, combine multiple samples from various individuals, prioritize list of variants by multi-filtering parameters. Additionnaly, users can perform a coverage analysis and quality check from any BAM file.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
A software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. The BamTools C++ API/library has been successfully integrated into a variety of applications. It provides the BAM file support for several utilities in the BEDtools suite.
Permits quality control of Next-Generation-Sequencing (NGS) tumor-normal experiments. NGS-Bits is separate into four steps: (1) gather information from raw reads, (2) map reads, (3) extract variant lists, and (4) combine result from precedent steps to then add quality control (QC) metrics for tumor-normal experiments. This tool includes all stages of single-sample NGS data analysis and adds special QC metrics for DNA sequencing of tumor-normal pairs.
Allows users to reformate and filter bioinformatics files. JVARKIT aims to simplify the grammar employed to filter bioinformatic file, for rendering possible to write a loop or a custom function. JVARKIT is a set of more than 100 java-based tools for bioinformatics.
Allows users to analyze, filter, annotate or transform biological sequence data. FAST is able to realize automated sampling, permutations and bootstrapping of sequences and sites and compute a population genetic statistics. It can assist empower non-biologist programmers to develop and communicate bioinformatics workflows for scientific investigations and publishing.
Allows users to filter, convert and combine multiple data files produced by high-throughput technologies. HTDP aims to aid global, real-time processing of large data sets using GUI. The software provides unlimited filtering and data reduction capabilities, also using itemized filtering conditions from external files. It can be used for conversion between different standard formats that are commonly used for high-throughput data.
Facilitates the design, optimization, and tracking of barcoded oligonucleotides. XSTK is useful for projects that require highly multiplexed polymerase chain reaction (PCR) and DNA sequencing. It builds a list of all possible DNA sequences of a specified length and then progressively culls sequences that may interfere with primary PCR amplification and/or sequencing steps.
Merges long overlapping sequence fragments. Fragment Merger Tool is a genome-agnostic, web-based, assembly software developed using hepatitis B virus (HBV) sequence data. It allows automated assembly of two to twelve long overlapping sequence fragments and enables assembly of sequence data from insertion or deletion mutants and recombinants, as a reference sequence is not used for assembly. The software can be used by researchers without specialist computer skills.
Provides utility functions implementing commonly used genomic operations. bedr is a formal BED-operations framework that offers a formal R interface to interact with BEDTools and BEDOPS. In addition to sort operations, it also supports identification of overlapping regions which can be collapsed to avoid downstream analytical challenges. This method is compatible with the ubiquitous BED tools paradigm and integrates with R-based workflows.
Simplifies downstream utilization of high-throughput sequencing (HTS) data. TBtools is designed to work with next generation sequencing (NGS) data for web-lab biology. This software suits for wet-lab biologists who are inexperienced in programming or command-line environments and seek to save time from daily sequence analysis work.
Produces improved results of variable length amplicons from HTAS. AMPtk is a bioinformatic pipeline developed to specifically address the quality issues identified by using spike-in mock communities. It analyzes variable length amplicon studies such as the fungal ITS1 or ITS2 molecular barcodes. This method provides the scientific community with a necessary tool to study fungal community diversity.
Offers a platform for single molecule real-time (SMRT) sequencing error correction and assembly for libraries of pooled amplicons. C3S-LAA is a pipeline dedicated to processing tiled amplicon resequence data from multiplexed libraries thanks to a clustering approach. The application is an open source software developed with the aim of extending a PacBio module for long amplicon analysis.
Assists users in processing and similarity searching of next-generation sequencing (NGS) data. nsearch is a program that offers components to handle biological sequences (access residues, create complement, reverse a sequence) including paired-end read merging, quality filtering and sequence similarity searching. Moreover, this tool supports two types of alphabets: DNA (nucleic acids, DNA and RNA) and protein (amino acids).
Assists in processing FASTA files containing DNA and protein sequences. SEDA is an application that allows users to (i) filter sequences based on different criteria (including text patterns), (ii) translate nucleic acid sequences into amino acid sequences, (iii) execute Blast analyses, (iv) remove duplicated sequences, and (v) sort, merge, split or reformat files.
Permits to get access to high-throughput sequencing data (HTS) formats. Htsjdk does not support latest Variant Call Format Specification, for example VCFv4.3 and BCFv2.2. It can be useful to manipulate data in HTS fields.
Allows computation of the overlap between two sets of genomic features. Overlap is a program, which, given two files of genomic features, indicates if each feature of the first set is overlapped by a feature of the second set. The software offers four modes and permits reporting of boolean overlap and quantitative overlap. It also provides information about inclusion, instead of general overlap.
Uses to designe multi-thread sort/merge tools for BAM files. NovoSort reduces run times from multi-threading and by combining sort & merge in one step. It uses a stable sort/merge algorithm that will not change the order of alignments with the same sort key and can optionally create BAM index file. This is a two phase sort merge, the first phase sorts as many reads as possible in memory and then writes segments of sorted records to temporary disk files. The second phase merges the sorted fragments to produce the final sorted file.
Concatenates different nucleotide, amino acid and structure sequence fragments of same taxa to one super matrix file in format which can be used for phylogenetic purposes. FASconCAT extracts taxon specific associated gene- or structure sequences out of given input files and links them to one string. Missing taxon sequences in single files are replaced either by 'N', 'X' or by 'dots', dependent on their taxon associated data level (nucleotide, amino acid or "dot-bracket" structures).
Allows users to obtain PacBio BAM files and their associated indices. pbbam is mainly composed of a core C++ library which permits to create, query and edit files corresponding to the PacBio Bam files specification. Besides, the software can be configured to accept additional languages and command-line utilities. It can also integrate CMake-based projects.
Offers to users several features to exploit genes and transcripts in general feature format (GFF) through the gene transfer format (GTF2) and GFF3 versions. GFF utilities provides a tool, gffread, for the validation, filtering and converting GFF files. gffcompare is another program that allows the evaluation and comparison of the accuracy of transcript assemblers and the intron/exon coordinates.
A computational method to rapidly and robustly merge overlapping paired-end reads. CASPER uses quality scores and k-mer frequency for merging. When the difference between the quality scores of mismatching base is significant, CASPER relies on the quality scores for correction. If not, CASPER instead examines k-mer-based contexts around the mismatch and makes a statistical decision.
0 - 0 of 0
1 - 37 of 37