Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

1 - 50 of 143 results
filter_list Filters
language Programming Language
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 143 results
GATK / Genome Analysis ToolKit
star_border star_border star_border star_border star_border
star star star star star
Focuses on variant discovery and genotyping. GATK provides a toolkit, developed at the Broad Institute, composed of several tools and ables to support projects of any size. The application compiles an assortment of command line allowing one to analyze of high-throughput sequencing (HTS) data in various formats such as SAM, BAM, CRAM or VCF. The website includes multiple documentation for guiding users.
star_border star_border star_border star_border star_border
star star star star star
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Calculates the probability that a given site is polymorphic. POLYBAYES identifies polymorphic locations by evaluating the likelihood of nucleotide heterogeneity within cross-sections of a multiple alignment. The anchored alignment, paralogue filtering and single nucleotide polymorphism (SNP) detection are accessed through a single program. The tool does not require base-perfect reference sequence to be effective and will work well with draft-quality sequences that have begun to dominate sequence production.
A variant caller and small genome assembler. The heart of DISCOVAR is a de novo genome assembler, one that is accurate enough to produce assemblies that can be used for variant calling given a reference sequence. DISCOVAR can also generate de novo assemblies for small genomes, but consider using DISCOVAR de novo instead which can assemble genomes up to mammalian size. DISCOVAR provides a more complete inventory of an individual’s genetic variants than had been previously possible. As such, it adds to the tools that can be used to probe the genetic basis of disease. It may be particularly useful in cases where targeted or exome sequencing fails to find causal mutations.
Allows de novo genome assembly and multisample variant calling. Cortex is a modular set of multi-threaded programs for manipulating assembly graphs. Linked de Bruijn Graph (LdBG) data structure and associated algorithms are implemented as part of the software. It was used for two tasks where long-range information is likely to be beneficial: finding large differences from a reference and analysis of genomic context for drug resistance genes, which was validated using a PacBio reference assembled for the sample.
Allows read alignment as well as single nucleotide polymorphism (SNP) detection and annotation. MAQGene launches the MAQ software and assembles a customized summary of the location and specific features of sequence variants of the mutant genome compared to a wild-type reference genome. The software also provides the option to compare any input whole genome sequencing (WGS) reads to any wild-type available reference genome with general-feature format (GFF) coding exon annotations files.
A platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems. It can be used to detect different types of variation: 1) germline variants (SNPs and indels) in individual samples or pools of samples, 2) multi-sample variants (shared or private) in multi-sample datasets (with mpileup), 3) somatic mutations, LOH events, and germline variants in tumor-normal pairs and 4) somatic copy number alterations (CNAs) in tumor-normal exome data.
Estimates allele frequency and call variants in heterogeneous samples. RVD2 improves upon current classifiers and has higher sensitivity and specificity over a wide range of median read depth and minor allele fraction. It is able to use multiple cores in parallel, which can significantly improve time efficiency. The tool does not address identification of indels, structural variants (SV) or copy number variants (CNV). Those mutations typically require specific data analysis models and tests that are different than those for single nucleotide variants
MAQ / Mapping and Assembly with Quality
Builds mapping assemblies from short reads generated by the next-generation sequencing machines. Maq is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data. Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment. For single-end reads, maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for paired-end reads, it always finds all paired hits with one of the two reads containing up to 1 mismatch. At the assembling stage, maq calls the consensus based on a statistical model.
A method for quick and robust variant detection in low-mappability regions. We showed that whereas variant calls at individual sites can be uncertain, clusters of related sites can carry reliable information. In particular, clusters can give confidence to the presence of variants and also help to better estimate their allelic abundance. We showed that analysis of variant clusters in a human genome can reveal up to hundreds of thousands of elements that have hitherto been cumbersome and impractical to study. We also extend the thesaurus approach to enhance detection of DNA changes across matched samples. In other words, we implement a personalized filtering strategy taking thesaurus annotations into account. This contribution removes low mapping quality from the list of difficulties in the analysis of matched sample and thus enables, for the first time, to use short-read sequencing data to describe the landscape of mutations in sequence-similar regions of the human genome. The implementation is designed to be general-purpose and extensible in order to accommodate several use-cases, in particular the genomics of cancer and of familial diseases.
Provides analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. Strelka is a variant calling method building upon the innovative Strelka somatic variant caller to improve upon aspects of variant calling for both germline and somatic analysis. The germline caller employs an efficient tiered haplotype model to improve accuracy and provide read-backed phasing, adaptively selecting between assembly and a faster alignment-based haplotyping approach at each variant locus. The germline caller also analyzes input sequencing data using a mixture-model indel error estimation method to improve robustness to indel noise.
star_border star_border star_border star_border star_border
star star star star star
Offers a platform for population-level analyses. dDocent is an open-source software dedicated to individually barcoded restriction-site associated DNA sequencing (RADseq) data processing. The application employs data reduction techniques and interact with other programs to propose features such as de novo assembly of RAD loci, single nucleotides polymorphisms (SNPs) and indel calling as well as quality trimming or baseline data filtering.
Presents methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. QCALL proposes two methods. In the first method, non-linkage disequilibrium analysis (NLDA), a dynamic programming algorithm was applied. In the second method, linkage disequilibrium analysis (LDA), shared haplotype structure was used to estimate posterior probabilities of SNPs and genotypes. QCALL with NLDA and LDA methods detects shared variants from multiple samples better than analyzing individual samples independently. In particular, the genotype accuracy is substantially improved.
SNIP-Seq / single nucleotide polymorphism identification from population sequence data
Utilizes short-read Illumina sequence data from a population of samples to detect single nucleotide polymorphisms (SNPs) and assign genotypes. For each potential SNP, SNIP-Seq utilizes the set of base calls across all samples to recalibrate base quality values, identifies SNPs in each sample individually and subsequently assigns genotypes to each sample at each SNP site. For sampling genotypes in each sample, SNIP-Seq uses a simple Bayesian model that assumes independence between multiple base calls.
Extracts causative variants in familial and sporadic genetic diseases. VariantMaster implements a methodology to evaluate the status (presence or absence) of a variant in familial or case-control contexts. The software allows users to identify causative variants in familial, sporadic germline, and somatic genetic disorders, including cancers. It also allows for the search of causative variants in one or more recurrently mutated genes in a pool of unrelated individuals sharing the same phenotype.
A web-based tool for detection, management and analysis of genetic variants including both single nucleotide polymorphisms (SNPs) and InDels. Version 3 now extends functionalities in order to easily manage and exploit SNPs derived from next generation sequencing technologies, such as GBS (genotyping by sequencing), WGRS (whole gre-sequencing) and RNA-Seq technologies. Based on the standard VCF (variant call format) format, the application offers an intuitive interface for filtering and comparing polymorphisms using user-defined sets of individuals and then establishing a reliable genotyping data matrix for further analyses. Namely, in addition to the various scaled-up analyses allowed by the application (genomic annotation of SNP, diversity analysis, haplotype reconstruction and network, linkage disequilibrium), SNiPlay3 proposes new modules for GWAS (genome-wide association studies), population stratification, distance tree analysis and visualization of SNP density.
A sensitive and robust approach for calling single-nucleotide variants (SNVs) from high-coverage sequencing datasets, based on a formal model for biases in sequencing error rates. LoFreq adapts automatically to sequencing run and position-specific sequencing biases and can call SNVs at a frequency lower than the average sequencing error rate in a dataset. LoFreq’s robustness, sensitivity and specificity were validated using several simulated and real datasets (viral, bacterial and human) and on two experimental platforms (Fluidigm and Sequenom).
Enables detection of genetic variation within a population of DNA molecules. Diff-seq is a sequencing-based mismatch detection assay that couples mismatch detection with high-throughput sequencing. The software allows for the identification of variation that could occur anywhere in a genome, and furthermore specifically targets sequencing capacity to the variant positions and their genomic context. It can be suitable for a variety of applications, from genotyping to estimate DNA polymerase error rates.
Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.
FamSeq / Family-based Sequencing program
A computational tool for calculating probability of variants in family-based sequencing data. It is still challenging to call rare variants. In family-based sequencing studies, information from all family members should be utilized to more accurately identify new germline mutations. FamSeq serves this purpose by providing the probability of an individual carrying a variant given his/her entire family’s raw measurements. FamSeq accommodates de novo mutations and can perform variant calling at chrX.
A statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data. BCFtools can manipulate variant calls in the variant call format (VCF) and its binary counterpart BCF. It also can discover somatic and germline mutations with appropriate input data, efficiently estimate site allele frequency, allele frequency spectrum and linkage disequilibrium, and test Hardy–Weinberg equilibrium and association.
Facilitates study of high-dimensional genomic and proteomic data by offering a comprehensive set of procedures for False discovery rate (FDR) estimation. fdrtool provides readily interpretable graphical output, and can be applied to very large scale (in the order of millions of hypotheses) multiple testing problems. It contains functions for non-parametric density estimation, for monotone regression, for computing the greatest convex minorant and the least concave majorant, for the half-normal and correlation distributions, and for computing empirical higher criticism scores and the corresponding decision threshold.
A package for phylogenomic analyses of data collected from conserved genomic loci using targeted enrichment. PHYLUCE allows the assembly of raw read data to contigs, the identification of ultra-conserved elements (UCE) contigs, parallel alignment generation, alignment trimming, and alignment data summary methods in preparation for analysis and alignment and SNP calling using UCE or other types of raw-read data. As it stands, the PHYLUCE package is useful for analyzing both data collected from UCE loci and also data collection from other types of loci for phylogenomic studies at the species, population, and individual levels.
A variant calling pipeline for Illumina whole-genome germline data. It de novo assembles short reads and then maps the assembly against a reference genome to call SNPs, short insertions/deletions (INDELs) and structural variations (SVs). FermiKit takes about one day to assemble 30-fold human whole-genome data on a modern 16-core server with 85GB RAM at the peak, and calls variants in half an hour to an accuracy comparable to the current practice. FermiKit assembly is a reduced representation of raw data while retaining most of the original information.
A variant detector and graphical alignment viewer for next-generation sequencing data in the SAM/BAM format, which is capable of pooling data from multiple source files. The variant detector takes advantage of SAM-specific annotations, and produces detailed output suitable for genotyping and identification of somatic mutations. The assembly viewer can display reads in the context of either a user-provided or automatically generated reference sequence, retrieve genome annotation features from a UCSC genome annotation database, display histograms of non-reference allele frequencies, and predict protein-coding changes caused by SNPs.
ReviSTER / Revise Simple Tandem repeat Error Reads
An automated pipeline using a 'local mapping reference reconstruction method' to revise mismapped or partially misaligned reads at simple tandem repeat loci. ReviSTER estimates alleles of repeat loci using a local alignment method and creates temporary local mapping reference sequences, and finally remaps reads to the local mapping references. Using this approach, ReviSTER was able to successfully revise reads misaligned to repeat loci from both simulated data and real data.
Calls single nucleotide polymorphisms (SNPs) and short indels for both Ion Torrent and 454 resequencing data. PyroHMMvar is a method that has two distinct features: (i) an HMM to formulate homopolymer errors and which can distinguish real signals from sequencing errors and thus improve the alignment of reads against the reference and (ii) a graph data structure that merges multiple aligned reads at a given locus into a weighted alignment graph. PyroHMMvar is also available as part of the toolkit PyroTools.
LAVA / Lightweight Assignment of Variant Alleles
An Next Generation Sequencing (NGS)-based genotyping algorithm for a given set of single nucleotide polymorphism (SNP) loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely identify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix’s Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as ∼5 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays.
0 - 0 of 0 results
1 - 24 of 24 results
filter_list Filters
computer Job seeker
Disable 8
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 24 of 24 results

By using OMICtools you acknowledge that you have read and accepted the terms of the end user license agreement.