An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but…
G T A T C G C T A
DNAzip
Desktop

DNAzip

A package to compress DNA sequence, using a reference genome. DNAzip uses a…

A package to compress DNA sequence, using a reference genome. DNAzip uses a series of compression techniques when, taken together, reduces the size of a single genome by orders of magnitude. It…

G T A T C G C T A
CRAM
Desktop

CRAM

A framework technology comprising file format and toolkit in which we combine…

A framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available…

G T A T C G C T A
DSRC
Desktop

DSRC DNA Sequence Reads Compression

An application designed for compression of data files containing reads from DNA…

An application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a…

G T A T C G C T A
GDC
Desktop

GDC Genome Differential Compressor

A utility designed for compression of genome collections from the same species.…

A utility designed for compression of genome collections from the same species. The amount of such collections can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression…

G T A T C G C T A
BEDOPS
Desktop

BEDOPS

A software suite for common genomic analysis tasks which offers improved…

A software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility…

G T A T C G C T A
NGC
Desktop

NGC

A compressor for aligned HTS sequencing data that enables the complete lossless…

A compressor for aligned HTS sequencing data that enables the complete lossless and lossy compression of mapped alignment data stored in SAM/BAM files.

G T A T C G C T A
Quip
Desktop

Quip

Compresses next-generation sequencing data in the FASTQ and SAM/BAM formats…

Compresses next-generation sequencing data in the FASTQ and SAM/BAM formats with extreme prejudice.

G T A T C G C T A
GReEn
Desktop

GReEn Genome Resequencing Encoding

A compression tool recently proposed for compressing genome resequencing data…

A compression tool recently proposed for compressing genome resequencing data using a reference genome sequence.

G T A T C G C T A
BEETL
Desktop

BEETL Burrows-Wheeler Extended Tool Library

Large-scale compression of genomic sequence databases with the Burrows-Wheeler…

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.

G T A T C G C T A
GRS
Desktop

GRS

A novel compression tool for efficient storage of Genome Re-Sequencing data.

A novel compression tool for efficient storage of Genome Re-Sequencing data.

G T A T C G C T A
SCALCE
Desktop

SCALCE Sequence Compression Algorithm using Locally Consistent Encoding

A tool for compressing FASTQ files. SCALCE is designed specifically for the…

A tool for compressing FASTQ files. SCALCE is designed specifically for the Illumina-generated FASTQ files, but supports any valid FASTQ with consistent read lengths. The SCALCE algorithm provides a…

G T A T C G C T A
sam_comp
Desktop

sam_comp

This is a simple arithmetic coding based compressor for the SAM and BAM (DNA…

This is a simple arithmetic coding based compressor for the SAM and BAM (DNA sequence alignment) file format.

G T A T C G C T A
fastqz
Desktop

fastqz

A compressor for the most common (Sanger format) FASTQ files, produced by DNA…

A compressor for the most common (Sanger format) FASTQ files, produced by DNA sequencing machines. Fastqz breaks the fastq file into three separate streams, it uses a compression method designed to…

G T A T C G C T A
fqzcomp
Desktop

fqzcomp

A basic fastq compressor, designed primarily for high performance.

A basic fastq compressor, designed primarily for high performance.

G T A T C G C T A
MFCompress
Desktop

MFCompress

A package for FASTA and multi-FASTA files compression. MFCompress provides…

A package for FASTA and multi-FASTA files compression. MFCompress provides additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost…

G T A T C G C T A
SAMZIP
Desktop

SAMZIP

An encoding and decoding tool for Sequence Alignment/Map (SAM) files.

An encoding and decoding tool for Sequence Alignment/Map (SAM) files.

G T A T C G C T A
DELIMINATE
Desktop

DELIMINATE

A practical implementation of a novel compression approach that can rapidly…

A practical implementation of a novel compression approach that can rapidly compress FASTA files containing genomic sequence data in a loss-less fashion.

G T A T C G C T A
KungFq
Desktop

KungFq

Compresses FASTQ files, decompress them and access single reads in the…

Compresses FASTQ files, decompress them and access single reads in the compressed ones. KungFQ is based on dividing the reads in blocks and superblocks and computes statistics over each superblocks…

G T A T C G C T A
LEON
Desktop

LEON

An all-in-one software for FASTQ file compression that handles DNA, header and…

An all-in-one software for FASTQ file compression that handles DNA, header and quality scores. LEON uses the same data structure for both DNA and quality scores compression, a de Bruijn Graph…

G T A T C G C T A
TwoPaCo
Desktop

TwoPaCo

A scalable low memory algorithm for constructing de Bruijn graphs from whole…

A scalable low memory algorithm for constructing de Bruijn graphs from whole genome sequences. TwoPaCo is based on identifying the positions of the genome which correspond to vertices of the…

G T A T C G C T A
AFRESh
Desktop

AFRESh

Targets the effective representation of the raw genomic symbol streams of both…

Targets the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a…

G T A T C G C T A
Quartz
Desktop

Quartz QUAlity score Reduction at Terabyte scale

A de novo quality score compression tool based on traversing the k-mer…

A de novo quality score compression tool based on traversing the k-mer landscape of next-generation sequencing read datasets. Quartz preserves quality scores for probable variant locations and…

G T A T C G C T A
Samcomp
Desktop

Samcomp

An algorithm for compression of FASTQ files. Samcomp performs reference based…

An algorithm for compression of FASTQ files. Samcomp performs reference based compression but requires previously aligned data in the SAM format instead. The tool is compared against existing…

G T A T C G C T A
HUGO
Desktop

HUGO Hierarchical mUlti-reference Genome cOmpression

A compression algorithm for aligned reads in the sorted Sequence Alignment/Map…

A compression algorithm for aligned reads in the sorted Sequence Alignment/Map format. HUGO first aligns short reads against a reference genome and stores exactly mapped reads for compression. For…

G T A T C G C T A
SECRAM
Desktop
G T A T C G C T A
TGC
Desktop

TGC Thousands Genomes Compressor

Estimates the boundaries of compression ratio for human genome compression. TGC…

Estimates the boundaries of compression ratio for human genome compression. TGC can be also used as a very effective tool for compression Variant Call Format (VCF) files. The success of our algorithm…

G T A T C G C T A
GTRAC
Desktop

GTRAC GenoType Random Access Compressor

Allows for fast access of information of a specific variant or the genotype of…

Allows for fast access of information of a specific variant or the genotype of a sample/group of samples over the compressed Variant Call Format (VCF) file. GTRAC achieves compression rates…

G T A T C G C T A
NRGC
Desktop

NRGC Novel Referential Genome Compressor

A referential genome compression algorithm to effectively and efficiently…

A referential genome compression algorithm to effectively and efficiently compress the genomic sequences. We employ a scoring based placement technique to quantify large variations among the genomic…

G T A T C G C T A
GeneCodeq
Desktop

GeneCodeq

A Bayesian method inspired by coding theory for adjusting quality scores to…

A Bayesian method inspired by coding theory for adjusting quality scores to improve the compressibility of quality scores without adversely impacting genotyping accuracy. GeneCodeq leverages a corpus…

G T A T C G C T A
CARGO
Desktop

CARGO Compressed ARchiving for GenOmics

A high-level framework to automatically generate software systems optimized for…

A high-level framework to automatically generate software systems optimized for the compressed storage of arbitrary types of large genomic data collections. Straightforward applications of our…

G T A T C G C T A
MetaCRAM
Desktop

MetaCRAM

A de novo, parallelized software suite specialized for FASTA and FASTQ format…

A de novo, parallelized software suite specialized for FASTA and FASTQ format metagenomic read processing and lossless compression. MetaCRAM integrates algorithms for taxonomy identification and…

G T A T C G C T A
KIC
Desktop

KIC K-mer Index Compressor

A FASTQ compressor based on a new integer-mapped k-mer indexing method. KIC…

A FASTQ compressor based on a new integer-mapped k-mer indexing method. KIC offers high compression ratio on sequence data, outstanding user-friendliness with graphic user interfaces, and proven…

G T A T C G C T A
Boiler
Desktop

Boiler

A software tool for compressing and querying large collections of RNA-seq…

A software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions…

G T A T C G C T A
paraDSRC
Desktop

paraDSRC

A high-performance tool for compressing next generation sequencing data using…

A high-performance tool for compressing next generation sequencing data using memory-distributed clusters. paraDSRC uses domain decomposition and message passing interface (MPI) to distributed data…

G T A T C G C T A
SOLiDzipper
Desktop

SOLiDzipper

A fast encoding method that can efficiently encode and decode NGS data. The…

A fast encoding method that can efficiently encode and decode NGS data. The basic strategy of SOLiDzipper is to divide and encode. NGS data files contain both the sequence and non-sequence…

G T A T C G C T A
ERGC
Desktop

ERGC Efficient Referential Genome Compressor

A genome compression tool. ERGC compresses a target genome using a reference…

A genome compression tool. ERGC compresses a target genome using a reference genome. It employs a divide and conquers strategy. At first it divides both the target and reference sequences into some…

G T A T C G C T A
FQC
Desktop

FQC

A fastq compression method that, in addition to providing significantly higher…

A fastq compression method that, in addition to providing significantly higher compression gains over GZIP, incorporates features necessary for universal adoption by data repositories/end-users. FQC…

G T A T C G C T A
QualComp
Desktop

QualComp

A lossy compression algorithm for the quality scores presented in a FASTQ file.…

A lossy compression algorithm for the quality scores presented in a FASTQ file. QualComp allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to…

G T A T C G C T A
cbc
Desktop

cbc

A program for compression and decompression of aligned reads presented in a SAM…

A program for compression and decompression of aligned reads presented in a SAM file. Note that the purpose of this algorithm is to compress the necessary information to reconstruct the reads…

G T A T C G C T A
BARCODE
Desktop

BARCODE

Achieves highly efficient compression by using a reference genome, but…

Achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress. BARCODE runs an order…

G T A T C G C T A
QVZ
Desktop

QVZ Quality Value Zip

A lossy compressor for the quality values presented in genomic data files…

A lossy compressor for the quality values presented in genomic data files (e.g., FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression…

G T A T C G C T A
GQT
Desktop

GQT Genotype Query Tools

A command line software and a C API for indexing and querying large-scale…

A command line software and a C API for indexing and querying large-scale genotype data sets like those produced by 1000 Genomes, the UK100K, and forthcoming datasets involving millions of genomes.…

G T A T C G C T A
MINCE
Desktop

MINCE

A technique to boost the compression of sequencing data that is based on the…

A technique to boost the compression of sequencing data that is based on the concept of bucketing similar reads so that they appear nearby in the file. MINCE is a technique for encoding collections…

G T A T C G C T A
SACO
Desktop

SACO

A lossless compression tool for the sequences alignments found in the MAF…

A lossless compression tool for the sequences alignments found in the MAF files. SACO is based on a mixture of finite-context models. Contrarily a recent approach, it addresses both the DNA bases and…

G T A T C G C T A
MAFCO
Desktop

MAFCO

A lossless compression tool specifically designed to compress MAF (Multiple…

A lossless compression tool specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from ≈ 34% to ≈ 57%, depending…

G T A T C G C T A
DeeZ
Desktop

DeeZ DeeNA-Zip

A tool for compressing SAM/BAM files, or more formally, a tool which does…

A tool for compressing SAM/BAM files, or more formally, a tool which does reference-based compression by local assembly. DeeZ were compared to other tools on bacterial RNA-seq data as well as human…

G T A T C G C T A
ORCOM
Desktop

ORCOM Overlapping Reads COmpression with Minimizers

A compressor of sequencing reads. ORCOM takes as an input FASTQ files (possibly…

A compressor of sequencing reads. ORCOM takes as an input FASTQ files (possibly gzipped) and stores the DNA symbols of each read in a highly-compressed form. Id and quality fields are not stored.…

G T A T C G C T A
G-SQZ
Desktop
Web

G-SQZ

A Huffman coding-based sequencing-reads-specific representation scheme that…

A Huffman coding-based sequencing-reads-specific representation scheme that compresses data without altering the relative order. G-SQZ has achieved from 65% to 81% compression on benchmark datasets,…

G T A T C G C T A
iDoComp
Desktop

iDoComp

A compressor of assembled genomes presented in FASTA format that compresses an…

A compressor of assembled genomes presented in FASTA format that compresses an individual genome using a reference genome for both the compression and the decompression. In terms of compression…

G T A T C G C T A
LFQC
Desktop

LFQC

A lossless non-reference based FASTQ compression algorithm that can elegantly…

A lossless non-reference based FASTQ compression algorithm that can elegantly run on commodity machines. LFQC is provisioned to run in in-core as well as out-of-core settings. The implementations are…

G T A T C G C T A
msbwt
Desktop

msbwt

A package for combining strings from sequencing into a data structure known as…

A package for combining strings from sequencing into a data structure known as the multi-string BWT (MSBWT).

G T A T C G C T A
CompMap
Desktop

CompMap

A reference-based compression program to speed up read mapping to related…

A reference-based compression program to speed up read mapping to related reference sequences. It is designed to eliminate repeat subsequences based on reference-base compression in the input…

G T A T C G C T A
SNPack
Desktop

SNPack

An algorithm and file format for compressing and retrieving SNP data,…

An algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies.

G T A T C G C T A
BEETL-fastq
Desktop

BEETL-fastq

A tool that not only compresses FASTQ-formatted DNA reads more compactly than…

A tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip but also permits rapid search for k-mer queries within the archived sequences.

G T A T C G C T A
CODOC
Desktop

CODOC

An open file format and API for the lossless and lossy compression of…

An open file format and API for the lossless and lossy compression of depth-of-coverage (DOC) signals stemming from high-throughput sequencing (HTS) experiments.

G T A T C G C T A
CWig
Desktop

CWig Compressed representation of Wiggle

A format and toolkit for storing and analysing genome-wide density signal data.

A format and toolkit for storing and analysing genome-wide density signal data.

G T A T C G C T A
Compressed SAM format
Desktop

Compressed SAM format CSAM

A compression approach offering lossless and lossy compression for SAM files.…

A compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the…

G T A T C G C T A
Khmer
Desktop

Khmer

Novel efficient methods, CompressEdge and CompressVertices, for comparing large…

Novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks.

G T A T C G C T A
CaBLAST/CaBLAT
Desktop
G T A T C G C T A
Genomedata
Desktop

Genomedata

A format for efficient storage of multiple tracks of numeric data anchored to a…

A format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space…

G T A T C G C T A
Gzip
Desktop

Gzip GNU zip

A compression utility designed to be a replacement for compress.

A compression utility designed to be a replacement for compress.

G T A T C G C T A
bzip2
Desktop

bzip2

A freely available, patent free, high-quality data compressor.

A freely available, patent free, high-quality data compressor.

G T A T C G C T A
Path encoding
Desktop

Path encoding

A technique for compressing short-read sequence files. It uses a reference (any…

A technique for compressing short-read sequence files. It uses a reference (any gzipped multi-FASTA file) to build a statistical model of the sequences, which is adaptively updated during compression.

G T A T C G C T A
LW-FQZip
Desktop

LW-FQZip

A lossless light-weight reference-based compression algorithm to compress FASTQ…

A lossless light-weight reference-based compression algorithm to compress FASTQ data. The three components of any given input, i.e., metadata, short reads and quality score strings, are first parsed…

G T A T C G C T A
CoGI
Desktop

CoGI Compressing Genomes as an Image

An approach for genome compression, which transforms the genomic sequences to a…

An approach for genome compression, which transforms the genomic sequences to a two-dimensional binary image (or bitmap), then applies a rectangular partition coding algorithm to compress the binary…

G T A T C G C T A
smallWig
Desktop

smallWig

A lossless compression method for WIG data offering the best known compression…

A lossless compression method for WIG data offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics…

G T A T C G C T A
COMRAD
Desktop

COMRAD COMpression using RedundAncy of Dna

Finds repeats over multiple passes through the data so already-compressed…

Finds repeats over multiple passes through the data so already-compressed regions are extended, leading to detection and compression of long repeated substrings.

G T A T C G C T A
RLZ
Desktop

RLZ Relative Lempel-Ziv

An algorithm that compresses a collection of genomes or sequences from the same…

An algorithm that compresses a collection of genomes or sequences from the same species with respect to the reference sequence for that species using a simple greedy technique, akin to LZ77 parsing…

Information

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.