1 - 38 of 38 results


star_border star_border star_border star_border star_border
star star star star star
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.


star_border star_border star_border star_border star_border
star star star star star
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).


A statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data. BCFtools can manipulate variant calls in the variant call format (VCF) and its binary counterpart BCF. It also can discover somatic and germline mutations with appropriate input data, efficiently estimate site allele frequency, allele frequency spectrum and linkage disequilibrium, and test Hardy–Weinberg equilibrium and association.


Allows to manage FASTA and FASTQ files. FASTdoop is based on a wide range of experiments. It supports FASTA files containing one or more short sequences or a single very large sequence of arbitrary length. It allows to parse the content of FASTQ files containing short sequences. The tool can efficiently handle FASTA files that are not adequately supported by that library. The routines in FASTdoop represent an advancement of the state of the art both in terms of versatility and efficiency.


Allows to parse, edit and write Graphical Fragment Assembly (GFA) files, complying with the proposed standard. RGFA permits simple graph manipulation, limited to operations which do not make any assumption on the graph content and do not define any custom fields. The graphs outputted by the tool can be converted into an RGL (Ruby Graph Library) graph object. It provides a way to create manipulation pipelines, which can then be applied to several graphs or their connected components in a unified way without manual interference.


A quality control script that makes FASTA format files compatible for a variety of downstream bioinformatics tools. Fasta-O-Matic automates handling of common but minor format issues that otherwise may halt pipelines. The need for automation must be balanced by the need for manual confirmation that any formatting error is actually minor rather than indicative of a corrupt data file. To that end Fasta-O-Matic reports any issues detected to the user with optionally color coded and quiet or verbose logs. Fasta-O-Matic can be used as a general pre-processing tool in bioinformatics workflows (e.g. to automatically wrap FASTA files so that they can be read by BioPerl). It was also developed as a sanity check for bioinformatic core facilities that tend to repeat common analysis steps on FASTA files received from disparate sources. Fasta-O-Matic can be set with format requirements specific to downstream tools as a first step in a larger analysis workflow.


Concatenates different nucleotide, amino acid and structure sequence fragments of same taxa to one super matrix file in format which can be used for phylogenetic purposes. FASconCAT extracts taxon specific associated gene- or structure sequences out of given input files and links them to one string. Missing taxon sequences in single files are replaced either by 'N', 'X' or by 'dots', dependent on their taxon associated data level (nucleotide, amino acid or "dot-bracket" structures).


Searches the textual description of sequences. textsearch seeks for words (specified as a regular expression) in the description text of one or more input sequences. It writes an output file with optional contents such as the name, description and accession number of any sequence whose description line from the annotation matches the search term. Optionally, the search is case-sensitive and the results output as an HTML table. textsearch is convenient for small input files but will be slow for larger files and databases; you should use SRS or Entrez instead.