One of the core issues of Bioinformatics is dealing with a profusion of (often poorly defined or ambiguous) file formats. Some ad hoc simple human readable formats have over time attained the status of de facto standards.Source text:(Cock et al., 2010) The Sanger FASTQ file…
G T A T C G C T A
BAN
Desktop

BAN Best Alignment Normalization

Applies all the variations in a variant call format (VCF) file to the reference…

Applies all the variations in a variant call format (VCF) file to the reference genome to create a sample genome, and then recalls the variants by aligning this sample genome back with the reference…

G T A T C G C T A
GVF
Format

GVF Genome Variation Format

An extension of Generic Feature Format version 3 (GFF3), is a simple…

An extension of Generic Feature Format version 3 (GFF3), is a simple tab-delimited format for DNA variant files, which uses Sequence Ontology to describe genome variation data.

G T A T C G C T A
EMBL
Format

EMBL

Comprises of a series of strictly controlled line types that are presented in a…

Comprises of a series of strictly controlled line types that are presented in a tabular manner and consists of four major blocks of data. EMBL format contains: descriptions and identifiers,…

G T A T C G C T A
FASTQ
Format

FASTQ

Stores sequences and Phred qualities in a single file. FASTQ format is concise…

Stores sequences and Phred qualities in a single file. FASTQ format is concise and compact. It has emerged as a common file format for sharing sequencing read data combining both the sequence and an…

G T A T C G C T A
SMD
Desktop

SMD Single-molecule dataset

Adoption of a common, standard data file format for sharing raw single-molecule…

Adoption of a common, standard data file format for sharing raw single-molecule data and analysis outcomes is a critical step for the emerging and powerful single-molecule field, which will benefit…

G T A T C G C T A
Cooler
Desktop

Cooler

Provides Python API to work with Hi-C data. Cooler is a support library for a…

Provides Python API to work with Hi-C data. Cooler is a support library for a sparse, compressed, binary persistent storage format for Hi-C contact matrices, called cool or COOL. The software…

G T A T C G C T A
bigBed format
Format

bigBed format

Stores annotation items that can either be simple, or a linked collection of…

Stores annotation items that can either be simple, or a linked collection of exons, much as BED files do.

G T A T C G C T A
bigWig format
Format

bigWig format

For display of dense, continuous data that will be displayed in the Genome…

For display of dense, continuous data that will be displayed in the Genome Browser as a graph.

G T A T C G C T A
SAM format
Format

SAM format Sequence Alignment/Map format

A generic alignment format for storing read alignments against reference…

A generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms.

G T A T C G C T A
VCF
Format

VCF Variant Call Format

A generic format for storing DNA polymorphism data such as SNPs, insertions,…

A generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations.

G T A T C G C T A
RNAML
Desktop

RNAML

Permits the description of higher level information about the data including,…

Permits the description of higher level information about the data including, but not restricted to, base pairs, base triples, and pseudoknots. RNAML is a syntax that allows storage and exchange of…

G T A T C G C T A
bedGraph format
Format

bedGraph format

Allows display of continuous-valued data in track format. This display type is…

Allows display of continuous-valued data in track format. This display type is useful for probability scores and transcriptome data.

G T A T C G C T A
FASTA format
Format

FASTA format

Used to specify the reference sequence for an imported genome. Each sequence in…

Used to specify the reference sequence for an imported genome. Each sequence in the FASTA file represents the sequence for a chromosome.

G T A T C G C T A
HDF
Format

HDF

A data model, library, and file format for storing and managing data. It…

A data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex…

G T A T C G C T A
SRF
Format

SRF Sequence Read Format

A generic format for DNA sequence data. The primary motivation for creating SRF…

A generic format for DNA sequence data. The primary motivation for creating SRF has been to enable a single format capable of storing data generated by any DNA sequencing technology.

G T A T C G C T A
SFF
Format

SFF Standard Flowgram Format

Used to store the information on one or many 454 Sequencing reads and their…

Used to store the information on one or many 454 Sequencing reads and their trace data.

G T A T C G C T A
BAM format
Format

BAM format

The compressed binary version of the Sequence Alignment/Map (SAM) format, a…

The compressed binary version of the Sequence Alignment/Map (SAM) format, a compact and index-able representation of nucleotide sequence alignments.

G T A T C G C T A
BED format
Format

BED format Browser Extensible Data format

Provides a flexible way to define the data lines that are displayed in an…

Provides a flexible way to define the data lines that are displayed in an annotation track.

G T A T C G C T A
WIG format
Format

WIG format Wiggle format

An older format for display of dense, continuous data such as GC percent,…

An older format for display of dense, continuous data such as GC percent, probability scores, and transcriptome data.

G T A T C G C T A
GFF
Format

GFF Generic Feature Format

A standard for describing genome annotation data.

A standard for describing genome annotation data.

G T A T C G C T A
GLF
Format

GLF

A format for storing marginal likelihoods for next-generation sequence data,…

A format for storing marginal likelihoods for next-generation sequence data, conditional on a set of possible genotypes.

Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.