Main logo
?
tutorial arrow
×
Submit new tools
Share tools covering the current topic. Provide easy-to-follow guidelines to improve their usability.
Share new tools with the community
Sign up for free to promote the availability of bioinformatics tools

Data file formats | High-throughput sequencing

One of the core issues of Bioinformatics is dealing with a profusion of (often poorly defined or ambiguous) file formats. Some ad hoc simple human readable formats have over time attained the status of de facto standards.Source text:(Cock et al.,…
G T A T C G C T A
BIOM
Format

BIOM Biological Observation Matrix

Supports sparse and dense matrix representations. BIOM is a format developed to…

Supports sparse and dense matrix representations. BIOM is a format developed to be a general-use format for representing biological sample by observation contingency tables. This project is designed…

G T A T C G C T A
proBAM
Nontypeable

proBAM

Allows to store and analyze peptide spectrum matches (PSMs) within the context…

Allows to store and analyze peptide spectrum matches (PSMs) within the context of the genome. proBAM is built upon the SAM format and its compressed binary version, BAM, with necessary modifications…

G T A T C G C T A
NEXUS
Nontypeable

NEXUS

Represents a file format designed to house systematic data. The goals of the…

Represents a file format designed to house systematic data. The goals of the NEXUS format are to allow future expansion, to include diverse kinds of information, to be independent of particular…

G T A T C G C T A
BGT
Nontypeable

BGT

Permits to separates sample phenotypes, site annotations and genotypes into…

Permits to separates sample phenotypes, site annotations and genotypes into individual files. BGT is a file format that allows to store and query whole-genome genotypes of tens to hundreds of…

G T A T C G C T A
mzML
Format

mzML

Provides a standard output format for mass spectrometry (MS) data that will…

Provides a standard output format for mass spectrometry (MS) data that will facilitate data sharing and analysis. MzML is focused on four keys objectives: (i) creation of a simple format, (ii)…

G T A T C G C T A
SBtab
Nontypeable

SBtab

Allows data exchange in Systems Biology (SB). SBtab is a flexible, table-based…

Allows data exchange in Systems Biology (SB). SBtab is a flexible, table-based format that comes with tools for diverse groups of users. The online tool comprises an automatic syntax validator for…

G T A T C G C T A
ISMRMRD
Nontypeable

ISMRMRD ISMRM Raw Data Format

Captures details of the magnetic resonance imaging (MRI) experiment in a way…

Captures details of the magnetic resonance imaging (MRI) experiment in a way that permits image reconstruction. ISMRMRD is a completely open and community-driven format. It combines a flexible header…

G T A T C G C T A
BAN
Desktop

BAN Best Alignment Normalization

Applies all the variations in a variant call format (VCF) file to the reference…

Applies all the variations in a variant call format (VCF) file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference…

G T A T C G C T A
MMTF
Format

MMTF MacroMolecular Transmission Format

Allows to transmit and store biomolecular structures for fast 3D visualization…

Allows to transmit and store biomolecular structures for fast 3D visualization and analysis. MMTF reduces bandwidth needs and allows in memory management of large structure. It can be parsed, in some…

G T A T C G C T A
GVF
Format

GVF Genome Variation Format

An extension of Generic Feature Format version 3 (GFF3), is a simple…

An extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data.

G T A T C G C T A
EMBL
Format

EMBL

Comprises of a series of strictly controlled line types that are presented in a…

Comprises of a series of strictly controlled line types that are presented in a tabular manner and consists of four major blocks of data. EMBL format contains: descriptions and identifiers,…

G T A T C G C T A
FASTQ
Format

FASTQ

Stores sequences and Phred qualities in a single file. FASTQ format is concise…

Stores sequences and Phred qualities in a single file. FASTQ format is concise and compact. It has emerged as a common file format for sharing sequencing read data combining both the sequence and an…

G T A T C G C T A
SMD
Desktop

SMD Single-molecule dataset

Adoption of a common, standard data file format for sharing raw single-molecule…

Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit…

G T A T C G C T A
celeganscdf
Desktop

celeganscdf

Contains an environment representing the Celegans.CDF file. celeganscdf is an R…

Contains an environment representing the Celegans.CDF file. celeganscdf is an R package available on Bioconductor.

G T A T C G C T A
caninecdf
Desktop

caninecdf

Contains an environment representing the Canine_2.cdf file. caninecdf is an R…

Contains an environment representing the Canine_2.cdf file. caninecdf is an R package available on Bioconductor.

G T A T C G C T A
bsubtiliscdf
Desktop

bsubtiliscdf

Provides an environment that describes bsubtilis.CDF file and dimensions.

Provides an environment that describes bsubtilis.CDF file and dimensions.

G T A T C G C T A
ecolicdf
Desktop

ecolicdf

Provides a package that contains a representation of environment of the…

Provides a package that contains a representation of environment of the E_coli_2.cdf file.

G T A T C G C T A
drosophilacdf
Desktop

drosophilacdf

Contains an environment to describe CDF files related to insect (drosophila).

Contains an environment to describe CDF files related to insect (drosophila).

G T A T C G C T A
cottoncdf
Desktop

cottoncdf

Contains an environment to describe CDF files related to cotton.

Contains an environment to describe CDF files related to cotton.

G T A T C G C T A
chickencdf
Desktop

chickencdf

Contains an environment to describe CDF files related to chicken.

Contains an environment to describe CDF files related to chicken.

G T A T C G C T A
bovinecdf
Desktop

bovinecdf

Supplies a R package for allowing visualization of CDF files about Bovine.

Supplies a R package for allowing visualization of CDF files about Bovine.

G T A T C G C T A
barley1cdf
Desktop

barley1cdf

Provides a R package for visualizing Barley1 CDF files.

Provides a R package for visualizing Barley1 CDF files.

G T A T C G C T A
ath1121501cdf
Desktop

ath1121501cdf

Provides a R package compiling CDF files representing arabidopsis ATH1-121501.

Provides a R package compiling CDF files representing arabidopsis ATH1-121501.

G T A T C G C T A
agcdf
Desktop

agcdf

Allows users to visualize files from Arabidopsis Genome Array in CDF format.

Allows users to visualize files from Arabidopsis Genome Array in CDF format.

G T A T C G C T A
KGML
Format

KGML KEGG Markup Language

Enables automatic drawing of KEGG pathways and provides facilities for…

Enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of gene/protein networks and chemical networks. The KEGG Markup Language (KGML) is an…

G T A T C G C T A
Cooler
Desktop

Cooler

Provides Python API to work with Hi-C data. Cooler is a support library for a…

Provides Python API to work with Hi-C data. Cooler is a support library for a sparse, compressed, binary persistent storage format for Hi-C contact matrices, called cool or COOL. The software…

G T A T C G C T A
HDF
Format

HDF

Allows to store and manage data. HDF5 is a data model, library, and file format…

Allows to store and manage data. HDF5 is a data model, library, and file format which supports an unlimited variety of datatypes. The format is designed for flexible and efficient I/O and for high…

G T A T C G C T A
ICS
Format

ICS Image Cytometry Standard

Provides a standard for writing images of any dimensionality and data type to…

Provides a standard for writing images of any dimensionality and data type to file. ICS is a data storage format proposed as a standard for use in image cytometry. Data from image measurement are…

G T A T C G C T A
bigBed format
Format

bigBed format

Stores annotation items that can either be simple, or a linked collection of…

Stores annotation items that can either be simple, or a linked collection of exons, much as BED files do.

G T A T C G C T A
bigWig format
Format

bigWig format

For display of dense, continuous data that will be displayed in the Genome…

For display of dense, continuous data that will be displayed in the Genome Browser as a graph.

G T A T C G C T A
SAM format
Format

SAM format Sequence Alignment/Map format

A generic alignment format for storing read alignments against reference…

A generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms.

G T A T C G C T A
RNAML
Desktop

RNAML

Permits the description of higher level information about the data including,…

Permits the description of higher level information about the data including, but not restricted to, base pairs, base triples, and pseudoknots. RNAML is a syntax that allows storage and exchange of…

G T A T C G C T A
NeXML
Nontypeable

NeXML

Provides a format for phylogenetic data by using concepts from graph…

Provides a format for phylogenetic data by using concepts from graph representation. NeXML is an open-source format based on XML which intends to provide a simplified alternative to NEXUS format for…

G T A T C G C T A
phyloXML
Nontypeable

phyloXML

Provides a format for phylogenetic documents. phyloXML is a standardized…

Provides a format for phylogenetic documents. phyloXML is a standardized extensible format based on XML that includes over 20 different elements such as confidence values, gene names, branch lengths…

G T A T C G C T A
cyp450cdf
Desktop

cyp450cdf

Contains an environment to describe CDF files related to the Cytochromes P450…

Contains an environment to describe CDF files related to the Cytochromes P450 (CYP450).

G T A T C G C T A
citruscdf
Desktop

citruscdf

Contains an environment to describe CDF files related to citrus.

Contains an environment to describe CDF files related to citrus.

G T A T C G C T A
MINC
Format

MINC Medical Image NetCDF

Provides users a medical imaging file format and Toolbox for use in medical…

Provides users a medical imaging file format and Toolbox for use in medical imaging. The original MINC file format and tools were based upon the NetCDF data format. The actual version was changed to…

G T A T C G C T A
BinaryCIF
Nontypeable

BinaryCIF

Aims to stores text based CIF files. BinaryCIF enables both lossless and lossy…

Aims to stores text based CIF files. BinaryCIF enables both lossless and lossy compression of the original CIF file. It permits users to encode macromolecular data.

G T A T C G C T A
GCG
Desktop

GCG

Produces graphics that can be used with a command line option to manipulate and…

Produces graphics that can be used with a command line option to manipulate and fine tune the final layout. GCG is an application developed to offers a variety of line types that serve as line…

G T A T C G C T A
R-pbh5
Desktop

R-pbh5

An R package for interacting with data in HDF5 format from the Pacific…

An R package for interacting with data in HDF5 format from the Pacific Biosciences.

G T A T C G C T A
bedGraph format
Format

bedGraph format

Allows display of continuous-valued data in track format. This display type is…

Allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data.

G T A T C G C T A
FASTA format
Format

FASTA format

Used to specify the reference sequence for an imported genome. Each sequence in…

Used to specify the reference sequence for an imported genome. Each sequence in the FASTA file represents the sequence for a chromosome.

G T A T C G C T A
SRF
Format

SRF Sequence Read Format

A generic format for DNA sequence data. The primary motivation for creating SRF…

A generic format for DNA sequence data. The primary motivation for creating SRF has been to enable a single format capable of storing data generated by any DNA sequencing technology.

G T A T C G C T A
SFF
Format

SFF Standard Flowgram Format

Used to store the information on one or many 454 Sequencing reads and their…

Used to store the information on one or many 454 Sequencing reads and their trace data.

G T A T C G C T A
BAM format
Format

BAM format

The compressed binary version of the Sequence Alignment/Map (SAM) format, a…

The compressed binary version of the Sequence Alignment/Map (SAM) format, a compact and index-able representation of nucleotide sequence alignments.

G T A T C G C T A
BED format
Format

BED format Browser Extensible Data format

Provides a flexible way to define the data lines that are displayed in an…

Provides a flexible way to define the data lines that are displayed in an annotation track.

G T A T C G C T A
WIG format
Format

WIG format Wiggle format

An older format for display of dense, continuous data such as GC percent,…

An older format for display of dense, continuous data such as GC percent, probability scores, and transcriptome data.

G T A T C G C T A
GFF
Format

GFF Generic Feature Format

A standard for describing genome annotation data.

A standard for describing genome annotation data.

G T A T C G C T A
GLF
Format

GLF

A format for storing marginal likelihoods for next-generation sequence data,…

A format for storing marginal likelihoods for next-generation sequence data, conditional on a set of possible genotypes.

Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.