tutorial arrow
×
Submit new tools
Share tools covering the current topic. Provide easy-to-follow guidelines to improve their usability.

Data file formats | High-throughput sequencing

One of the core issues of Bioinformatics is dealing with a profusion of (often poorly defined or ambiguous) file formats. Some ad hoc simple human readable formats have over time attained the status of de facto standards.Source text:(Cock et al.,…
G T A T C G C T A
BIOM
Format

BIOM Biological Observation Matrix

Supports sparse and dense matrix representations. BIOM is a format developed to…

Supports sparse and dense matrix representations. BIOM is a format developed to be a general-use format for representing biological sample by observation contingency tables. This project is designed…

G T A T C G C T A
proBAM
Nontypeable

proBAM

Allows to store and analyze peptide spectrum matches (PSMs) within the context…

Allows to store and analyze peptide spectrum matches (PSMs) within the context of the genome. proBAM is built upon the SAM format and its compressed binary version, BAM, with necessary modifications…

G T A T C G C T A
NEXUS
Nontypeable

NEXUS

Represents a file format designed to house systematic data. The goals of the…

Represents a file format designed to house systematic data. The goals of the NEXUS format are to allow future expansion, to include diverse kinds of information, to be independent of particular…

G T A T C G C T A
BGT
Nontypeable

BGT

Permits to separates sample phenotypes, site annotations and genotypes into…

Permits to separates sample phenotypes, site annotations and genotypes into individual files. BGT is a file format that allows to store and query whole-genome genotypes of tens to hundreds of…

G T A T C G C T A
mzML
Format

mzML

Provides a standard output format for mass spectrometry (MS) data that will…

Provides a standard output format for mass spectrometry (MS) data that will facilitate data sharing and analysis. MzML is focused on four keys objectives: (i) creation of a simple format, (ii)…

G T A T C G C T A
SBtab
Nontypeable

SBtab

Allows data exchange in Systems Biology (SB). SBtab is a flexible, table-based…

Allows data exchange in Systems Biology (SB). SBtab is a flexible, table-based format that comes with tools for diverse groups of users. The online tool comprises an automatic syntax validator for…

G T A T C G C T A
ISMRMRD
Nontypeable

ISMRMRD ISMRM Raw Data Format

Captures details of the magnetic resonance imaging (MRI) experiment in a way…

Captures details of the magnetic resonance imaging (MRI) experiment in a way that permits image reconstruction. ISMRMRD is a completely open and community-driven format. It combines a flexible header…

G T A T C G C T A
BAN
Desktop

BAN Best Alignment Normalization

Applies all the variations in a variant call format (VCF) file to the reference…

Applies all the variations in a variant call format (VCF) file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference…

G T A T C G C T A
MMTF
Format

MMTF MacroMolecular Transmission Format

Allows to transmit and store biomolecular structures for fast 3D visualization…

Allows to transmit and store biomolecular structures for fast 3D visualization and analysis. MMTF reduces bandwidth needs and allows in memory management of large structure. It can be parsed, in some…

G T A T C G C T A
GVF
Format

GVF Genome Variation Format

An extension of Generic Feature Format version 3 (GFF3), is a simple…

An extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data.

G T A T C G C T A
EMBL
Format

EMBL

Comprises of a series of strictly controlled line types that are presented in a…

Comprises of a series of strictly controlled line types that are presented in a tabular manner and consists of four major blocks of data. EMBL format contains: descriptions and identifiers,…

G T A T C G C T A
FASTQ
Format

FASTQ

Stores sequences and Phred qualities in a single file. FASTQ format is concise…

Stores sequences and Phred qualities in a single file. FASTQ format is concise and compact. It has emerged as a common file format for sharing sequencing read data combining both the sequence and an…

G T A T C G C T A
SMD
Desktop

SMD Single-molecule dataset

Adoption of a common, standard data file format for sharing raw single-molecule…

Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit…

G T A T C G C T A
KGML
Format

KGML KEGG Markup Language

Enables automatic drawing of KEGG pathways and provides facilities for…

Enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of gene/protein networks and chemical networks. The KEGG Markup Language (KGML) is an…

G T A T C G C T A
Cooler
Desktop

Cooler

Provides Python API to work with Hi-C data. Cooler is a support library for a…

Provides Python API to work with Hi-C data. Cooler is a support library for a sparse, compressed, binary persistent storage format for Hi-C contact matrices, called cool or COOL. The software…

G T A T C G C T A
HDF
Format

HDF

Allows to store and manage data. HDF5 is a data model, library, and file format…

Allows to store and manage data. HDF5 is a data model, library, and file format which supports an unlimited variety of datatypes. The format is designed for flexible and efficient I/O and for high…

G T A T C G C T A
ICS
Format

ICS Image Cytometry Standard

Provides a standard for writing images of any dimensionality and data type to…

Provides a standard for writing images of any dimensionality and data type to file. ICS is a data storage format proposed as a standard for use in image cytometry. Data from image measurement are…

G T A T C G C T A
bigBed format
Format

bigBed format

Stores annotation items that can either be simple, or a linked collection of…

Stores annotation items that can either be simple, or a linked collection of exons, much as BED files do.

G T A T C G C T A
bigWig format
Format

bigWig format

For display of dense, continuous data that will be displayed in the Genome…

For display of dense, continuous data that will be displayed in the Genome Browser as a graph.

G T A T C G C T A
SAM format
Format

SAM format Sequence Alignment/Map format

A generic alignment format for storing read alignments against reference…

A generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms.

G T A T C G C T A
RNAML
Desktop

RNAML

Permits the description of higher level information about the data including,…

Permits the description of higher level information about the data including, but not restricted to, base pairs, base triples, and pseudoknots. RNAML is a syntax that allows storage and exchange of…

G T A T C G C T A
MINC
Format

MINC Medical Image NetCDF

Provides users a medical imaging file format and Toolbox for use in medical…

Provides users a medical imaging file format and Toolbox for use in medical imaging. The original MINC file format and tools were based upon the NetCDF data format. The actual version was changed to…

G T A T C G C T A
BinaryCIF
Nontypeable

BinaryCIF

Aims to stores text based CIF files. BinaryCIF enables both lossless and lossy…

Aims to stores text based CIF files. BinaryCIF enables both lossless and lossy compression of the original CIF file. It permits users to encode macromolecular data.

G T A T C G C T A
GCG
Desktop

GCG

Produces graphics that can be used with a command line option to manipulate and…

Produces graphics that can be used with a command line option to manipulate and fine tune the final layout. GCG is an application developed to offers a variety of line types that serve as line…

G T A T C G C T A
bedGraph format
Format

bedGraph format

Allows display of continuous-valued data in track format. This display type is…

Allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data.

G T A T C G C T A
FASTA format
Format

FASTA format

Used to specify the reference sequence for an imported genome. Each sequence in…

Used to specify the reference sequence for an imported genome. Each sequence in the FASTA file represents the sequence for a chromosome.

G T A T C G C T A
SRF
Format

SRF Sequence Read Format

A generic format for DNA sequence data. The primary motivation for creating SRF…

A generic format for DNA sequence data. The primary motivation for creating SRF has been to enable a single format capable of storing data generated by any DNA sequencing technology.

G T A T C G C T A
SFF
Format

SFF Standard Flowgram Format

Used to store the information on one or many 454 Sequencing reads and their…

Used to store the information on one or many 454 Sequencing reads and their trace data.

G T A T C G C T A
BAM format
Format

BAM format

The compressed binary version of the Sequence Alignment/Map (SAM) format, a…

The compressed binary version of the Sequence Alignment/Map (SAM) format, a compact and index-able representation of nucleotide sequence alignments.

G T A T C G C T A
BED format
Format

BED format Browser Extensible Data format

Provides a flexible way to define the data lines that are displayed in an…

Provides a flexible way to define the data lines that are displayed in an annotation track.

G T A T C G C T A
WIG format
Format

WIG format Wiggle format

An older format for display of dense, continuous data such as GC percent,…

An older format for display of dense, continuous data such as GC percent, probability scores, and transcriptome data.

G T A T C G C T A
GFF
Format

GFF Generic Feature Format

A standard for describing genome annotation data.

A standard for describing genome annotation data.

G T A T C G C T A
GLF
Format

GLF

A format for storing marginal likelihoods for next-generation sequence data,…

A format for storing marginal likelihoods for next-generation sequence data, conditional on a set of possible genotypes.

Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.