Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

1 - 50 of 92 results
filter_list Filters
build Technology
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 92 results
Quartz / QUAlity score Reduction at Terabyte scale
A de novo quality score compression tool based on traversing the k-mer landscape of next-generation sequencing read datasets. Quartz preserves quality scores for probable variant locations and compresses quality scores of concordant bases by resetting them to a default value. It preserves quality scores at locations that potentially differ from this consensus genome. The Quartz software will benefit any researchers who are generating, storing, mapping, or analyzing large amounts of DNA, RNA, Chip-seq, or exome sequencing data.
GQT / Genotype Query Tools
A command line software and a C API for indexing and querying large-scale genotype data sets like those produced by 1000 Genomes, the UK100K, and forthcoming datasets involving millions of genomes. GQT represents genotypes as compressed bitmap indices, which reduce computational burden of variant queries based on sample genotypes, phenotypes, and relationships by orders of magnitude over standard "variant-centric" indexing strategies. This index can significantly expand the capabilities of population-scale analyses by providing interactive-speed queries to data sets with millions of individuals.
A framework technology comprising file format and toolkit in which we combine highly efficient and tunable reference-based compression of sequence data with a data format that is directly available for computational use. This compression method is tunable: The storage of quality scores and unaligned sequences may be adjusted for different experiments to conserve information or to minimize storage costs, and provides one opportunity to address the threat that increasing DNA sequence volumes will overcome our ability to store the sequences.
Compressed SAM format / CSAM
A compression approach offering lossless and lossy compression for SAM files. The structures and techniques proposed are suitable for representing SAM files, as well as supporting fast access to the compressed information. They generate more compact lossless representations than BAM, which is currently the preferred lossless compressed SAM-equivalent format; and are self-contained, that is, they do not depend on any external resources to compress or decompress SAM files.
star_border star_border star_border star_border star_border
star star star star star
Permits to compress FASTQ data. LW-FQZip is a lossless light-weight reference-based compression algorithm. The data are first split into metadata, short reads and quality scores, respectively and then processes independently with different schemes. The software is equipped with lightweight mapping model, bitwise prediction by partial matching (PPM), arithmetic coding, and multi-threading parallelism. It shows good compatibility to long-read sequencing data and is hoped to provide insights into the storage problems of new sequencing data.
SECRAM / Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map
A privacy-preserving solution for the secure storage of compressed aligned genomic data. SECRAM enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared to BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared to CRAM, SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage.
GDC / Genome Differential Compressor
A utility designed for compression of genome collections from the same species. The amount of such collections can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Universal compression programs like gzip or bzip2 might be used for this purpose, but it is obvious that a specialized tool can work much better, since a universal compressor does not use the properties of such data sets, e.g., long approximate repetitions at long distances.
A software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantitation. Boiler also allows the user to pose fast and useful related queries without decompressing the entire file. Boiler is not a general-purpose substitute for RNA-seq SAM/BAM files, but it is an extremely space-efficient alternative that works well with tools like Cufflinks and StringTie.
TGC / Thousands Genomes Compressor
Estimates the boundaries of compression ratio for human genome compression. TGC can be also used as a very effective tool for compression Variant Call Format (VCF) files. The success of our algorithm was possible not only because of the variant database, but also because we searched for cross-correlations between individuals. In other words, for each individual, similarities to any other previously processed individual (i.e. runs of repeating variants) can be found.
QVZ / Quality Value Zip
A lossy compressor for the quality values presented in genomic data files (e.g., FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit. QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms.
Targets the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a Context-Adaptive Binary Arithmetic Coding scheme, to compress raw genetic codes. It compresses both genomic reads and assembled genomic sequences without reference files. AFRESh splits the genomic data stream into blocks and selects, for each block, the most effective tool from a set of encoding and prediction tools. Comparing to generic compression approaches, a compression gain is achieved of up to 41% compared to GNU Gzip and 22% compared to 7-Zip at the Ultra setting.
Serves to ChIP-seq Wig data. Wig is a standard file format, which in this setting contains relevant read density information crucial for visualization and downstream processing. ChIPWig may be executed in two different modes: lossless and lossy. Lossless ChIPWig compression allows for random access and fast queries in the file through careful variable-length block-wise encoding. ChIPWig also stores the summary statistics of each block needed for guided access. Lossy ChIPWig performs quantization of the read density values before feeding them into the lossless ChIPWig compressor.
Compressed quality scores of a FASTQ file. AQUa is based on the AFRESh framework, which supports many features: (i) single-pass encoding, (ii) Context-Adaptive Binary Arithmetic Coding (CABAC), (iii) random access in combination with CABAC, (iv) flexible configuration of the coding tools used, (v) flexible configuration of coding complexity and effectiveness, and (vi) ease of extensibility with additional input file formats, additional coding tools and additional output file formats.
A scalable low memory algorithm for constructing de Bruijn graphs from whole genome sequences. TwoPaCo is based on identifying the positions of the genome which correspond to vertices of the compacted graph. TwoPaCo works by narrowing down the set of candidates using a probabilistic data structure, in order to make the deterministic memory-intensive approach feasible. TwoPaCo can construct the graph for 100 simulated human genomes in less than a day and eight real primates in less than two hours, on a typical shared-memory machine.
HARC / HAsh-based Read Compressor
Reorders reads approximately according to their genome position and encodes them to remove the redundancy between consecutive reads. HARC is an algorithm for read compression that does not require a reference genome. It can be used in cases involving unsequenced species and metagenomics. While reordering reads can lead to better compression, the read order in general, and the read-pairing information in particular can be useful in downstream analysis. Therefore, this algorithm allows compression both with and without preserving the read order.
GTRAC / GenoType Random Access Compressor
Allows for fast access of information of a specific variant or the genotype of a sample/group of samples over the compressed Variant Call Format (VCF) file. GTRAC achieves compression rates comparable to the state-of-the-art compressors, while allowing for ultra fast querying on the compressed domain. Specifically, the proposed algorithm allows for fast retrieval of all individuals/samples that possess certain variants, and the retrieval of all variants from a group of individuals/samples. Thus GTRAC will allow researchers to work efficiently with a highly compressed database containing the genotype information of a collection of samples.
Assists users in exploiting the alignment information contained in the SAM/BAM files. CALQ is a lossy compressor for quality values that computes a genotype certainty level per genomic locus to determine the acceptable coarseness of quality value quantization for all the quality values associated to that locus. It also uses the alignment information to determine the acceptable level of distortion for the quality values such that subsequent downstream analyses are presumably not affected.
0 - 0 of 0 results
1 - 9 of 9 results
filter_list Filters
computer Job seeker
Disable 4
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 9 of 9 results