Variant functional detection software tools | High-throughput sequencing data analysis
Modern sequencing technologies produce increasingly detailed data on genomic variation. However, conventional methods for relating either individual variants or mutated genes to phenotypes present known limitations given the complex, multigenic nature of many diseases or traits.
Determines if an amino acid substitution is deleterious to protein function. SiFT can be employed to prioritize nonsynonymous or missense variants. It is able to deal with protein conservation with homologous sequences and the severity of the amino acid change. This tool can be applied to human genome and nonhuman organisms. It can run on a large number of protein sequences using a single graphical processing unit.
Assists researchers to perform evaluation of the pathogenic potential of DNA sequence alterations. MutationTaster is an online application that aims to determine the functional consequences of amino acid substitutions, short insertion and/or deletion (indel) mutations, variants spanning intron-exon borders, intronic and synonymous alterations. Moreover, this tool is able to categorize confirmed polymorphisms and known disease mutations.
Predicts the effects of mutations while taking into account the interactions that occur between amino acids in proteins or bases in RNA. EVmutation predicts the relative favourability of unseen mutations by inferring context-dependent effects. It is able to better account for selective pressures than models that do not account for epistasis, for example, in the case of the toxin–antitoxin complex ParED. The tool was tested by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations.
Predicts the possible impact of an amino acid substitution on the structure and function of a human protein. PolyPhen predicts the functional significance of an allele replacement from its individual features by a Naïve Bayes classifier. The web application allows users to (i) predict the effect of a single-residue substitution or reference single nucleotide polymorphism SNP, (ii) analyze SNPs in a batch mode, and (iii) search in a database of precomputed predictions for the whole human exome sequence space.
Provides a suite of methods important for the prediction of protein structural and functional features. predictProtein is a web server that incorporates over 30 tools. This software searches up-to-date public sequence databases, creates alignments, and predicts aspects of protein structure and function. It can help when little is known about the protein in question. For medium-to-high throughput analyses, downloadable software packages and the PredictProtein Machine Image (PPMI) are available.
Represents a clinical pathogenicity classifier. M-CAP aims to misclassify no more than 5% of pathogenic variants while aggressively reducing the list of variants of uncertain significance. This tool provides: (i) a method that combines amino acid conservation features with gradient boosting trees that can be applied to any variant training set and (ii) computed scores trained on mutations linked to Mendelian diseases that can be directly used by clinicians to interpret variants of uncertain consequences.
An unsupervised spectral approach for scoring variants which does not make use of labeled training data. Eigen produces estimates of predictive accuracy for each functional annotation score, and subsequently uses these estimates of accuracy to derive the aggregate functional score for variants of interest as a weighted linear combination of individual annotations. The Eigen score is particularly useful in prioritizing likely causal variants in a region of interest when it is combined with population-level genetic data in the framework of a hierarchical model.
Identifies such spatial hotspots (clusters) and interprets the potential function of variants within them. HotSpot3D is a computational tool which identifies mutation–mutation and mutation–drug clusters using three-dimensional protein structures and correlates these clusters with known or potentially interacting functional variants, domains, and proteins. It uses structures from the Protein Data Bank (PDB) and variant and/or drug co-structures from DrugPort.
Estimates gene expression levels ab initio from sequences. ExPecto is based on a deep learning method with spatial feature transformation and L2-regularized linear models. It can be applied to a wide regulatory region of 40-kb promoter-proximal sequences. This tool builds a repository of potential regulatory sequence representations capable of determining the epigenomic effects of any genomic variant from sequence.
Determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Simply input the coordinates of your variants and the nucleotide changes to find out the genes and transcripts affected by the variants, location of the variants (e.g. upstream of a transcript, in coding sequence, in non-coding RNA, in regulatory regions), consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift); known variants that match yours, and associated minor allele frequencies from the 1000 Genomes Project, SIFT and PolyPhen scores for changes to protein sequence.
Displays Human disease-related mutations on the structural interactome. Mapping of mutations on protein structures and on interaction interfaces allows you to visualize the region of the interactome that they affect and helps in rationalizing their mechanism of action.
Allows prediction of the effects of mutations across a variety of deep mutational scanning experiments. DeepSequence includes features for extracting quantitative features for supervised learning, and generating libraries of new sequences satisfying apparent constraints. Moreover, it can serve for modeling dependencies in sequences as a nonlinear combination of constraints between subsets of residues.
Retrieves related sequence for a given protein sequence. SIFT Sequence intends to determine whether an amino acid substitution is deleterious by using a multistep algorithm exploiting sequence conservation and amino acid properties. This web application accepts a protein sequence or substitutions of interest and allows the settings of several parameters including the automatic removing of sequences according their similarity with the queried one.
A web-based tool, knowledgebase and community for analysis and interpretation of human variant files. GeneTalk provides an intuitive web-based interface for geneticists that analyze human sequence variants. It assists a clinical geneticist who is searching for information about specific sequence variants and connects this user to other users with expertise for the same sequence variant.
Allows highly accurate genome-scale identification of causative variants involved in human disease. PVP is a system which annotates and prioritizes disease variants in whole exome sequencing (WES) and whole genome sequencing (WGS) data. The software can identify causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes.
An integrated framework for the analysis and interpretation of the consequences of variants in the human kinome. wKinMut web-server offers direct prediction of the potential pathogenicity of the mutations from a number of methods, including prediction method based on the combination of information from a range of diverse sources, including physicochemical properties and functional annotations from FireDB and Swissprot and kinase-specific characteristics such as the membership to specific kinase groups, the annotation with disease-associated GO terms or the occurrence of the mutation in PFAM domains, and the relevance of the residues in determining kinase subfamily specificity from S3Det.
Utilizes machine learning to integrate missense mutation context at multiple scales. CHASM uses the Random Forest algorithm to discriminate somatic missense mutations (referred to hereafter as missense mutations) as either cancer drivers or passengers. This program can also serve for evaluating the statistical significance of cancer type-specific predictions for each of 32 cancer types from the Cancer Genome Atlas (TCGA), and pan-cancer predictions for all TCGA cancer types in aggregate.
Performs cancer-related analysis of variants. CRAVAT returns mutation interpretations in a dynamic interactive web environment for sorting, visualizing and inferring mechanism. The software (i) performs all projecting and assigns sequence ontology, (ii) predicts mutation impact using multiple bioinformatics classifiers normalized, (iii) allows for joint prioritization of all non-silent mutation types, organizes annotation from many sources on graphical displays of protein sequence and 3D structure, and (iv) facilitates dynamic filtering. It is suitable for both large and small studies and developed for easy integration with other software.
Annotates and predicts the effects of single nucleotide polymorphisms (SNPs). SnpEff features include: (1) the ability to make thousands of predictions per second; (2) the ability to add custom genomes and annotations; (3) the ability to integrate with Galaxy (4) compatibility with multiple species and multiple codon usage tables, (5) integration with Broad's Genome Analysis Toolkit (GATK) and (6) the ability to perform non-coding annotations. It enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
Predicts the functional impact of amino-acid substitutions in proteins. Mutationassessor employs information based on the analysis of evolutionary conservation patterns in protein family multiple sequence alignments. It has been validated on a large set of disease associated and polymorphic variants. This tool enables the determination of mutations discovered in cancer or missense polymorphisms.