An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but…
Desktop app
G T A T C G C T A DNAzip DNAzip

DNAzip

A package to compress DNA sequence, using a reference genome. DNAzip uses a…

A package to compress DNA sequence, using a reference genome. DNAzip uses a series of compression techniques when, taken together, reduces the size of a single genome by orders of magnitude. It…

Desktop app
G T A T C G C T A CRAM CRAM

CRAM

A framework technology comprising file format and toolkit in which we combine…

A framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available…

Desktop app
G T A T C G C T A DNA Sequence Reads… DNA Sequence Reads Compression

DSRC DNA Sequence Reads Compression

An application designed for compression of data files containing reads from DNA…

An application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a…

Desktop app
G T A T C G C T A Genome Differential… Genome Differential Compressor

GDC Genome Differential Compressor

A utility designed for compression of genome collections from the same species.…

A utility designed for compression of genome collections from the same species. The amount of such collections can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression…

Desktop app
G T A T C G C T A BEDOPS BEDOPS

BEDOPS

A software suite for common genomic analysis tasks which offers improved…

A software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility…

Desktop app
G T A T C G C T A NGC NGC

NGC

A compressor for aligned HTS sequencing data that enables the complete lossless…

A compressor for aligned HTS sequencing data that enables the complete lossless and lossy compression of mapped alignment data stored in SAM/BAM files.

Desktop app
G T A T C G C T A Quip Quip

Quip

Compresses next-generation sequencing data in the FASTQ and SAM/BAM formats…

Compresses next-generation sequencing data in the FASTQ and SAM/BAM formats with extreme prejudice.

Desktop app
G T A T C G C T A Genome Resequencing… Genome Resequencing Encoding

GReEn Genome Resequencing Encoding

A compression tool recently proposed for compressing genome resequencing data…

A compression tool recently proposed for compressing genome resequencing data using a reference genome sequence.

Desktop app
G T A T C G C T A Burrows-Wheeler… Burrows-Wheeler Extended Tool Library

BEETL Burrows-Wheeler Extended Tool Library

Large-scale compression of genomic sequence databases with the Burrows-Wheeler…

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.

Desktop app
G T A T C G C T A GRS GRS

GRS

A novel compression tool for efficient storage of Genome Re-Sequencing data.

A novel compression tool for efficient storage of Genome Re-Sequencing data.

Desktop app
G T A T C G C T A Sequence Compression… Sequence Compression Algorithm using Locally…

SCALCE Sequence Compression Algorithm using Locally Consistent Encoding

A tool for compressing FASTQ files. SCALCE is designed specifically for the…

A tool for compressing FASTQ files. SCALCE is designed specifically for the Illumina-generated FASTQ files, but supports any valid FASTQ with consistent read lengths. The SCALCE algorithm provides a…

Desktop app
G T A T C G C T A sam_comp sam_comp

sam_comp

This is a simple arithmetic coding based compressor for the SAM and BAM (DNA…

This is a simple arithmetic coding based compressor for the SAM and BAM (DNA sequence alignment) file format.

Desktop app
G T A T C G C T A fastqz fastqz

fastqz

A compressor for the most common (Sanger format) FASTQ files, produced by DNA…

A compressor for the most common (Sanger format) FASTQ files, produced by DNA sequencing machines. Fastqz breaks the fastq file into three separate streams, it uses a compression method designed to…

Desktop app
G T A T C G C T A fqzcomp fqzcomp

fqzcomp

A basic fastq compressor, designed primarily for high performance.

A basic fastq compressor, designed primarily for high performance.

Desktop app
G T A T C G C T A MFCompress MFCompress

MFCompress

A package for FASTA and multi-FASTA files compression. MFCompress provides…

A package for FASTA and multi-FASTA files compression. MFCompress provides additional average compression gains of almost 50%, i.e. it potentially doubles the available storage, although at the cost…

Desktop app
G T A T C G C T A SAMZIP SAMZIP

SAMZIP

An encoding and decoding tool for Sequence Alignment/Map (SAM) files.

An encoding and decoding tool for Sequence Alignment/Map (SAM) files.

Desktop app
G T A T C G C T A DELIMINATE DELIMINATE

DELIMINATE

A practical implementation of a novel compression approach that can rapidly…

A practical implementation of a novel compression approach that can rapidly compress FASTA files containing genomic sequence data in a loss-less fashion.

Desktop app
G T A T C G C T A KungFq KungFq

KungFq

Compresses FASTQ files, decompress them and access single reads in the…

Compresses FASTQ files, decompress them and access single reads in the compressed ones. KungFQ is based on dividing the reads in blocks and superblocks and computes statistics over each superblocks…

Desktop app
G T A T C G C T A LEON LEON

LEON

An all-in-one software for FASTQ file compression that handles DNA, header and…

An all-in-one software for FASTQ file compression that handles DNA, header and quality scores. LEON uses the same data structure for both DNA and quality scores compression, a de Bruijn Graph…

Desktop app
G T A T C G C T A TwoPaCo TwoPaCo

TwoPaCo

A scalable low memory algorithm for constructing de Bruijn graphs from whole…

A scalable low memory algorithm for constructing de Bruijn graphs from whole genome sequences. TwoPaCo is based on identifying the positions of the genome which correspond to vertices of the…

Desktop app
G T A T C G C T A AFRESh AFRESh

AFRESh

Targets the effective representation of the raw genomic symbol streams of both…

Targets the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a…

Desktop app
G T A T C G C T A QUAlity score… QUAlity score Reduction at Terabyte scale

Quartz QUAlity score Reduction at Terabyte scale

A de novo quality score compression tool based on traversing the k-mer…

A de novo quality score compression tool based on traversing the k-mer landscape of next-generation sequencing read datasets. Quartz preserves quality scores for probable variant locations and…

Desktop app
G T A T C G C T A Samcomp Samcomp

Samcomp

An algorithm for compression of FASTQ files. Samcomp performs reference based…

An algorithm for compression of FASTQ files. Samcomp performs reference based compression but requires previously aligned data in the SAM format instead. The tool is compared against existing…

Desktop app
G T A T C G C T A Hierarchical… Hierarchical mUlti-reference Genome cOmpression

HUGO Hierarchical mUlti-reference Genome cOmpression

A compression algorithm for aligned reads in the sorted Sequence Alignment/Map…

A compression algorithm for aligned reads in the sorted Sequence Alignment/Map format. HUGO first aligns short reads against a reference genome and stores exactly mapped reads for compression. For…

Desktop app
G T A T C G C T A Selective retrieval… Selective retrieval on Encrypted and Compressed…
Desktop app
G T A T C G C T A Thousands Genomes… Thousands Genomes Compressor

TGC Thousands Genomes Compressor

Estimates the boundaries of compression ratio for human genome compression. TGC…

Estimates the boundaries of compression ratio for human genome compression. TGC can be also used as a very effective tool for compression Variant Call Format (VCF) files. The success of our algorithm…

Desktop app
G T A T C G C T A GenoType Random… GenoType Random Access Compressor

GTRAC GenoType Random Access Compressor

Allows for fast access of information of a specific variant or the genotype of…

Allows for fast access of information of a specific variant or the genotype of a sample/group of samples over the compressed VCF file. GTRAC achieves compression rates comparable to the…

Desktop app
G T A T C G C T A Novel Referential… Novel Referential Genome Compressor

NRGC Novel Referential Genome Compressor

A referential genome compression algorithm to effectively and efficiently…

A referential genome compression algorithm to effectively and efficiently compress the genomic sequences. We employ a scoring based placement technique to quantify large variations among the genomic…

Desktop app
G T A T C G C T A GeneCodeq GeneCodeq

GeneCodeq

A Bayesian method inspired by coding theory for adjusting quality scores to…

A Bayesian method inspired by coding theory for adjusting quality scores to improve the compressibility of quality scores without adversely impacting genotyping accuracy. GeneCodeq leverages a corpus…

Desktop app
G T A T C G C T A Compressed ARchiving… Compressed ARchiving for GenOmics

CARGO Compressed ARchiving for GenOmics

A high-level framework to automatically generate software systems optimized for…

A high-level framework to automatically generate software systems optimized for the compressed storage of arbitrary types of large genomic data collections. Straightforward applications of our…

Desktop app
G T A T C G C T A MetaCRAM MetaCRAM

MetaCRAM

A de novo, parallelized software suite specialized for FASTA and FASTQ format…

A de novo, parallelized software suite specialized for FASTA and FASTQ format metagenomic read processing and lossless compression. MetaCRAM integrates algorithms for taxonomy identification and…

Desktop app
G T A T C G C T A K-mer Index Compressor K-mer Index Compressor

KIC K-mer Index Compressor

A FASTQ compressor based on a new integer-mapped k-mer indexing method. KIC…

A FASTQ compressor based on a new integer-mapped k-mer indexing method. KIC offers high compression ratio on sequence data, outstanding user-friendliness with graphic user interfaces, and proven…

Desktop app
G T A T C G C T A Boiler Boiler

Boiler

A software tool for compressing and querying large collections of RNA-seq…

A software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions…

Desktop app
G T A T C G C T A paraDSRC paraDSRC

paraDSRC

A high-performance tool for compressing next generation sequencing data using…

A high-performance tool for compressing next generation sequencing data using memory-distributed clusters. paraDSRC uses domain decomposition and message passing interface (MPI) to distributed data…

Desktop app
G T A T C G C T A SOLiDzipper SOLiDzipper

SOLiDzipper

A fast encoding method that can efficiently encode and decode NGS data. The…

A fast encoding method that can efficiently encode and decode NGS data. The basic strategy of SOLiDzipper is to divide and encode. NGS data files contain both the sequence and non-sequence…

Desktop app
G T A T C G C T A Efficient Referential… Efficient Referential Genome Compressor

ERGC Efficient Referential Genome Compressor

A genome compression tool. ERGC compresses a target genome using a reference…

A genome compression tool. ERGC compresses a target genome using a reference genome. It employs a divide and conquers strategy. At first it divides both the target and reference sequences into some…

Desktop app
G T A T C G C T A FQC FQC

FQC

A fastq compression method that, in addition to providing significantly higher…

A fastq compression method that, in addition to providing significantly higher compression gains over GZIP, incorporates features necessary for universal adoption by data repositories/end-users. FQC…

Desktop app
G T A T C G C T A QualComp QualComp

QualComp

A lossy compression algorithm for the quality scores presented in a FASTQ file.…

A lossy compression algorithm for the quality scores presented in a FASTQ file. QualComp allows the user to specify the rate (bits per quality score) prior to compression, independent of the data to…

Desktop app
G T A T C G C T A cbc cbc

cbc

A program for compression and decompression of aligned reads presented in a SAM…

A program for compression and decompression of aligned reads presented in a SAM file. Note that the purpose of this algorithm is to compress the necessary information to reconstruct the reads…

Desktop app
G T A T C G C T A BARCODE BARCODE

BARCODE

Achieves highly efficient compression by using a reference genome, but…

Achieves highly efficient compression by using a reference genome, but completely circumvents the need for alignment, affording a great reduction in the time needed to compress. BARCODE runs an order…

Desktop app
G T A T C G C T A Quality Value Zip Quality Value Zip

QVZ Quality Value Zip

A lossy compressor for the quality values presented in genomic data files…

A lossy compressor for the quality values presented in genomic data files (e.g., FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression…

Desktop app
G T A T C G C T A Genotype Query Tools Genotype Query Tools

GQT Genotype Query Tools

A command line software and a C API for indexing and querying large-scale…

A command line software and a C API for indexing and querying large-scale genotype data sets like those produced by 1000 Genomes, the UK100K, and forthcoming datasets involving millions of genomes.…

Desktop app
G T A T C G C T A MINCE MINCE

MINCE

A technique to boost the compression of sequencing data that is based on the…

A technique to boost the compression of sequencing data that is based on the concept of bucketing similar reads so that they appear nearby in the file. MINCE is a technique for encoding collections…

Desktop app
G T A T C G C T A SACO SACO

SACO

A lossless compression tool for the sequences alignments found in the MAF…

A lossless compression tool for the sequences alignments found in the MAF files. SACO is based on a mixture of finite-context models. Contrarily a recent approach, it addresses both the DNA bases and…

Desktop app
G T A T C G C T A MAFCO MAFCO

MAFCO

A lossless compression tool specifically designed to compress MAF (Multiple…

A lossless compression tool specifically designed to compress MAF (Multiple Alignment Format) files. Compared to gzip, the proposed tool attains a compression gain from ≈ 34% to ≈ 57%, depending…

Desktop app
G T A T C G C T A DeeNA-Zip DeeNA-Zip

DeeZ DeeNA-Zip

A tool for compressing SAM/BAM files, or more formally, a tool which does…

A tool for compressing SAM/BAM files, or more formally, a tool which does reference-based compression by local assembly. DeeZ were compared to other tools on bacterial RNA-seq data as well as human…

Desktop app
G T A T C G C T A Overlapping Reads… Overlapping Reads COmpression with Minimizers

ORCOM Overlapping Reads COmpression with Minimizers

A compressor of sequencing reads. ORCOM takes as an input FASTQ files (possibly…

A compressor of sequencing reads. ORCOM takes as an input FASTQ files (possibly gzipped) and stores the DNA symbols of each read in a highly-compressed form. Id and quality fields are not stored.…

Desktop app
Web app
G T A T C G C T A G-SQZ G-SQZ

G-SQZ

A Huffman coding-based sequencing-reads-specific representation scheme that…

A Huffman coding-based sequencing-reads-specific representation scheme that compresses data without altering the relative order. G-SQZ has achieved from 65% to 81% compression on benchmark datasets,…

Desktop app
G T A T C G C T A iDoComp iDoComp

iDoComp

A compressor of assembled genomes presented in FASTA format that compresses an…

A compressor of assembled genomes presented in FASTA format that compresses an individual genome using a reference genome for both the compression and the decompression. In terms of compression…

Desktop app
G T A T C G C T A LFQC LFQC

LFQC

A lossless non-reference based FASTQ compression algorithm that can elegantly…

A lossless non-reference based FASTQ compression algorithm that can elegantly run on commodity machines. LFQC is provisioned to run in in-core as well as out-of-core settings. The implementations are…

Desktop app
G T A T C G C T A msbwt msbwt

msbwt

A package for combining strings from sequencing into a data structure known as…

A package for combining strings from sequencing into a data structure known as the multi-string BWT (MSBWT).

Desktop app
G T A T C G C T A CompMap CompMap

CompMap

A reference-based compression program to speed up read mapping to related…

A reference-based compression program to speed up read mapping to related reference sequences. It is designed to eliminate repeat subsequences based on reference-base compression in the input…

Desktop app
G T A T C G C T A SNPack SNPack

SNPack

An algorithm and file format for compressing and retrieving SNP data,…

An algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies.

Desktop app
G T A T C G C T A BEETL-fastq BEETL-fastq

BEETL-fastq

A tool that not only compresses FASTQ-formatted DNA reads more compactly than…

A tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip but also permits rapid search for k-mer queries within the archived sequences.

Desktop app
G T A T C G C T A CODOC CODOC

CODOC

An open file format and API for the lossless and lossy compression of…

An open file format and API for the lossless and lossy compression of depth-of-coverage (DOC) signals stemming from high-throughput sequencing (HTS) experiments.

Desktop app
G T A T C G C T A Compressed… Compressed representation of Wiggle

CWig Compressed representation of Wiggle

A format and toolkit for storing and analysing genome-wide density signal data.

A format and toolkit for storing and analysing genome-wide density signal data.

Desktop app
G T A T C G C T A CSAM CSAM

Compressed SAM format CSAM

A compression approach offering lossless and lossy compression for SAM files.…

A compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the…

Desktop app
G T A T C G C T A Khmer Khmer

Khmer

Novel efficient methods, CompressEdge and CompressVertices, for comparing large…

Novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks.

Desktop app
G T A T C G C T A Compression-accelerated…
Desktop app
G T A T C G C T A Genomedata Genomedata

Genomedata

A format for efficient storage of multiple tracks of numeric data anchored to a…

A format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space…

Desktop app
G T A T C G C T A GNU zip GNU zip

Gzip GNU zip

A compression utility designed to be a replacement for compress.

A compression utility designed to be a replacement for compress.

Desktop app
G T A T C G C T A bzip2 bzip2

bzip2

A freely available, patent free, high-quality data compressor.

A freely available, patent free, high-quality data compressor.

Desktop app
G T A T C G C T A Path encoding Path encoding

Path encoding

A technique for compressing short-read sequence files. It uses a reference (any…

A technique for compressing short-read sequence files. It uses a reference (any gzipped multi-FASTA file) to build a statistical model of the sequences, which is adaptively updated during compression.

Desktop app
G T A T C G C T A LW-FQZip LW-FQZip

LW-FQZip

A lossless light-weight reference-based compression algorithm to compress FASTQ…

A lossless light-weight reference-based compression algorithm to compress FASTQ data. The three components of any given input, i.e., metadata, short reads and quality score strings, are first parsed…

Desktop app
G T A T C G C T A Compressing Genomes… Compressing Genomes as an Image

CoGI Compressing Genomes as an Image

An approach for genome compression, which transforms the genomic sequences to a…

An approach for genome compression, which transforms the genomic sequences to a two-dimensional binary image (or bitmap), then applies a rectangular partition coding algorithm to compress the binary…

Desktop app
G T A T C G C T A smallWig smallWig

smallWig

A lossless compression method for WIG data offering the best known compression…

A lossless compression method for WIG data offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics…

Desktop app
G T A T C G C T A COMpression using… COMpression using RedundAncy of Dna

COMRAD COMpression using RedundAncy of Dna

Finds repeats over multiple passes through the data so already-compressed…

Finds repeats over multiple passes through the data so already-compressed regions are extended, leading to detection and compression of long repeated substrings.

Desktop app
G T A T C G C T A Relative Lempel-Ziv Deprecated Relative Lempel-Ziv

RLZ Relative Lempel-Ziv

An algorithm that compresses a collection of genomes or sequences from the same…

An algorithm that compresses a collection of genomes or sequences from the same species with respect to the reference sequence for that species using a simple greedy technique, akin to LZ77 parsing…

Advertisements
Join Omic Community

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.