Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

1 - 35 of 35 results
filter_list Filters
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 35 of 35 results
star_border star_border star_border star_border star_border
star star star star star
A k-mer counting algorithm. It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.
seer / Sequence Element EnRichment
Identifies sequence elements. seer can detect associations with antibiotic resistance caused by both presence of a gene and by single-nucleotide polymorphism (SNP) in coding regions, as well as discover novel invasiveness factors. This tool implements and combines three key insights: a scan of all possible k-mers with a distributed string mining algorithm, an appropriate alignment-free correction for clonal population structure, and a fast association analysis of all counted k-mers.
KAT / K-mer Analysis Toolkit
A user-friendly, extendible and scalable toolkit for rapidly counting, comparing and analysing k-mers from various data sources. The tools in KAT assist the user with a wide range of tasks including error profiling, assessing sequencing bias and identifying contaminants and de novo genome assembly QC and validation. KAT is a C++11 application containing multiple tools, each of which exploits multi-core machines via multi-threading where possible. Core functionality is contained in a library designed to promote rapid development of new tools.
A method that balances time, space and accuracy requirements to efficiently extract frequent k-mers even for high-coverage libraries and large genomes such as human. Turtle is designed to minimize cache misses in a cache-efficient manner by using a pattern-blocked Bloom filter to remove infrequent k-mers from consideration in combination with a novel sort-and-compact scheme, instead of a hash, for the actual counting. Although this increases theoretical complexity, the savings in cache misses reduce the empirical running times. A variant of method can resort to a counting Bloom filter for even larger savings in memory at the expense of false-negative rates in addition to the false-positive rates common to all Bloom filter-based approaches.
A fast reusable k-mer toolkit capable of running on multiple platforms. KAnalyze is packaged with an API for integration into other programs as well as a CLI for manual execution and scripted pipelines. The count module has a graphical mode for desktop use. Because it is designed for longevity, the project is organized, documented and tested. The source code includes unit tests to quickly verify accuracy as the code changes. KAnalyze makes both speed and accuracy available to k-mer applications.
A hashing algorithm tuned for processing DNA/RNA sequences. ntHash provides a fast way to compute multiple hash values for a given k-mer, without repeating the whole procedure for each value. To do so, a single hash value is computed from a given k-mer, and then each extra hash value is computed by few more multiplication, shifting and XOR operations on the initial hash value. This would be very useful for certain bioinformatics applications, such as those that utilize the Bloom filter data structure. Experimental results demonstrate a substantial speed improvement over conventional approaches, while retaining a near-ideal hash value distribution.
KCMBT / k-mer Counter based on Multiple Burst Trees
A very fast single-threaded k-mer counting algorithm. KCMBT uses cache efficient burst tries to store k-mers. A burst trie is a trie in which a full node is split into multiple nodes to make space for insertion of new elements. This algorithm combines a number of powerful ideas to enable faster output. These ideas include utilization of burst tries to store k-mers, consideration of (k + x)-mers, and unifying a k-mer and its count in a single unit. We compare our devised algorithm with available all well-known algorithms. Our experimental results show that KCMBT is around 30% faster than the previous best-performing algorithm KMC2 for human genome dataset. As another example, our algorithm is around six times faster than Jellyfish2. Overall, KCMBT is 20%-30% faster than KMC2 on five benchmark data sets when both the algorithms were run using multiple threads.
Allows whole-genome genotyping of genome variants from raw sequencing reads. FastGT enables to directly genotype known variants from next-generation sequencing (NGS) data by counting unique k-mers. The crucial component of FastGT is a pre-compiled flat-file database of genomic variants and corresponding k-mer pairs that overlap with each variant. The software is not limited to identifying single nucleotide variants (SNVs): any known variant that can be associated with a unique and variant-specific k-mer can be detected.
Enables detection of fetal aneuploidies from next-generation sequencing (NGS) reads. NIPTmer is a software package and workflow process in which the mapping of reads is replaced with counting a predefined set of k-mers straight from the formatted sequencing raw data. The software consists of three main steps: (1) creating k-mer lists, (2) counting the k-mers from the samples, and (3) calling the aneuploidy based on the counts. It was tested using two sets of cfDNA samples of pregnant women.
Identifies all the k-mers that occur more than once in a DNA sequence data set. BFCounter does this using a Bloom filter, a probabilistic data structure that stores all the observed k-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique k-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting k-mers in sequence data with errors.
Provides data structure to index k-mers of reads. Gk-arrays is an algorithm that builds data structure to index reads. This structure is kept in main memory once built and repeatedly accessed to answer different kinds of queries like. This resource offers index fast to build, that requires less space than alternative uncompressed solutions, and can thus handle larger read collections: 40 million vs 20 million reads for the hash tables. Moreover, Gk-arrays adapts well to variable length reads.
Identifies and excludes non-target sequences independent of database. SAG-QC calculates the probability that a sequence was derived from contaminants by comparing k-mer compositions with the no template control sequences. It can determine bins of target sequences without any existing information. The tool is designed to exclude contaminant sequences from contigs. It can predict the distribution of target sequences accurately unless the single-amplified genome (SAG) sequences are extremely contaminated.
Provides a structure for querying a sequence by using large collections of RNA-seq experiments. SeqOthello includes three mains functions that allows users to: (i) transforms k-mer files in the required format; (ii) merges a subset of k-mer files into group; (iii) and generates the corresponding mapping between the entire set of k-mers and their experiment ids. The application can be used for contrasting several patient cohorts or for evaluating the prevalence of clinically important features in different patient populations.
Lists the candidate biomarkers behind a studied bacterial phenotype and creates a predictive model. PhenotypeSeeker consists of two subprograms: (1) “PhenotypeSeeker modeling” that builds a statistical model for phenotype prediction and (2) “PhenotypeSeeker Prediction” that uses the regression model generated by 'PhenotypeSeeker modeling' to conduct fast phenotype predictions. The software supports both discrete and continuous phenotypes as inputs and considers the population structure to highlight only the possible causal variations. It is suitable for predicting phenotypes from large sequencing datasets.
A disk-based approach, to efficiently perform k-mer counting for large genomes using a small amount of memory. This approach is based on a novel technique called Minimum Substring Partitioning (MSP). MSP breaks short reads into multiple disjoint partitions such that each partition can be loaded into memory and processed individually. By leveraging the overlaps among the k-mers derived from the same short read, MSP can achieve astonishing compression ratio so that the I/O cost can be significantly reduced. For the task of k-mer counting, MSPKmerCounter offers a very fast and memory-efficient solution.
0 - 0 of 0 results
1 - 9 of 9 results
filter_list Filters
computer Job seeker
Disable 4
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 9 of 9 results

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.