Resources Analytics Protocols arrow_drop_down
Geneious Microsatellite Plugin
Allows to fit a ladder, call peaks, bin alleles and produce a table of genotypes. Geneious Microsatellite Plugin allows to resolve problem with dirty data by manually adjusting curves, peaks, ladders and bins. Alleles, peak calls and bins can be exported to Excel for further analysis.
Identifies inherited alleles of microsatellite loci from next-generation sequencing (NGS) data, using a discretized Gaussian mixture model combined with a rules-based approach. GenoTan is a program that also employs an homopolymer decomposition, to estimate error bias toward deletion or insertion in homopolymer runs. The software was designed to detected microsatellite variants shorter than read lengths.
GMATo / Genome-wide Microsatellite Analyzing Tool
Permits simple sequence repeats simple sequence repeats (SSR) discovery and comprehensive statistical analysis, especially for large genomes. GMATo can be used for microsatellite sequence identification from any given DNA sequences or genomes at any size.
Allows simple sequence repeats (SSR) discovery and locus development from 454-generated raw data. HighSSR is a microsatellite prediction framework that can facilitate: the recognition of SSR motifs, the parsing of multiplex identifier (MID) tagged sequences for identification of multiplexed samples, the identification of unique SSR loci within a sample and the development of polymerase chain reaction (PCR) primers for the recovered loci. It can be applied to cluster reads made with platforms such as Illumina HiSeq 2000/2500 and Ion Torrent PGM.
Consists of an exact match tandem repeat finder. INVERTER is a program that does not require users to specify either the pattern or a particular pattern size. This software is integrated with a data visualization tool and has a built-in graphical user interface.
Performs short tandem repeat (STR) profiling in whole-genome sequencing data sets. lobSTR is an algorithm that consists of three steps: it (1) scans genomic libraries, flags informative reads that fully encompass STR loci, and characterizes their STR sequence, (2) uses a divide-and conquer strategy that anchors the nonrepetitive flanking regions of STR reads to the genome for revealing the STR position and length, and finally it (3) allelotypes the STRs.
MISA / MIcroSAtellite identification tool
Detects perfect microsatellites and compound microsatellites in nucleotide sequences. MISA can predict perfect compound microsatellites that contains multiple occurrences of more than one simple sequence motif. This software is based on two Perl scripts that serves as interface modules for the program-to-program data interchange to design primers flanking of the microsatellite loci. It can exploit the NCBI database to find sequences by defining the corresponding accession numbers as input.
Identifies microsatellite repeat elements directly from raw 454 or Illumina paired-end sequencing reads and then designs polymerase chain reaction (PCR) primers to amplify these repeat loci.
Determines genotypes for microsatellite repeats in high-throughput sequencing data. RepeatSeq is based on a Bayesian model selection that is build on an empirically derived error model including sequence and read properties. This enables the assignment of the most probable genotype and deals with the reference length of the repeat, the repeat unit size and the average base quality of the mapped reads.
SciRoKo / SSR Classification and Investigation by Robert Kofler
A user-friendly software tool for the identification of microsatellites in genomic sequences. The combination of an extremely fast search algorithm with a built-in summary statistic tool makes SciRoKo an excellent tool for full genome analysis. SciRoKo contains two main modules: a simple sequence repeat (SSR) search module, which supports five different SSR search modes and a module for SSR-statistics, notably for mismatch frequency and compound microsatellite analysis. Compared to other already existing tools, SciRoKo also allows the analysis of compound microsatellites.
SSR Locator / Simple Sequence Repeat Locator
Allows users to discover single sequence repeats. SSR Locator permits (1) simple sequence repeats (SSR) discovery, (2) primer design, and (3) polymerase chain reaction (PCR) simulation between the primers obtained from original sequences and other fasta files. It can be used for data mining strategies to find SSR primers in genomic or expressed sequences (ESTs/cDNAs) and for microsatellite discovery in databanks of related species.
A Bioinformatic Infrastructure for Identifying Microsatellites From Paired-End Illumina High-Throughput DNA Sequencing Data. All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
A program for ab initio identification of the tandem repeats. T-REKS is based on clustering of lengths between identical short strings by using a K-means algorithm. T-REKS being linked to the Protein Repeat DataBase opens the way for large-scale analysis of protein tandem repeats. T-REKS can also be applied to the nucleotide sequences.
Finds significant tandem repeats using short reads. TRhist is a tandem repeat profiler that enable to sense and locate short tandem repeats (STRs) from billions of short reads. Information provided by the software allow user to align the flanking regions and other end read to the reference to locate the STR in the genome.
A python program to identify microsatellite repeat regions based on known polymorphisms identified in a ".vcf" report after using SAMtools to analyze next-generation sequencing files.
PoPoolation TE
A quick and simple pipeline for the analysis of transposable element insertions in (natural) populations using next generation sequencing.
A tool for discovery and genotyping of transposable element variants (TEVs) (also known as mobile element insertions) from next-generation sequencing reads aligned to a reference genome in BAM format. The goal is to call TEVs that are not present in the reference genome but present in the sample that has been sequenced. It should be noted that RetroSeq can be used to locate any class of viral insertion in any species where whole-genome sequencing data with a suitable reference genome is available.
Investigates transposable elements (TE) insertions in next-generation sequencing (NGS) data. T-lex allows users to accurate genotyping of individual TE insertions and get the estimation of their population frequencies both using individual strain and pooled NGS. It uses information from (i) a module specifically designed to identify target site duplications and (ii) the genomic context of each insertion, to identify putatively miss-annotated TE insertions.
Visualizes sequence variant data from whole exome data, so that it is possible to identify autozygous regions in consanguineous individuals.
A web-based application aimed at autozygosity mapping. HomozygosityMapper is independent of parameters like family structure or allele frequencies, the ‘homozygosity score’ is calculated simply from the observed homozygosity and it is robust against genotyping errors. HomozygosityMapper is much faster than conventional linkage software. The integration with GeneDistiller greatly facilitates the search for promising candidate genes compared to the conventional approach. We also encourage geneticists to consider HomozygosityMapper as a public repository for genotypes and results when publishing their homozygosity mappings. Due to its user-friendly intuitive interface and the lack of any local hardware requirements, it can be used by the geneticists themselves without the need for computer specialists.
HomSI / Homozygous Stretch Identifier from next-generation sequencing data
Identifies homozygous stretches using new generation sequencing (NGS) data. HomSI was designed to define homozygous stretches in consanguineous families from NGS data. To identify and visualize the homozygous stretches, the software processes each variant and generates several graphs. It was evaluated using both a simulated dataset generated and a real dataset of three disease genes within the homozygous regions, which have been previously identified using a combination of exome and single-nucleotide polymorphism (SNP) microarray data.
A method that combines Grantham Variation (GV) and Grantham Deviation (GD) scores to predict the transactivation activity of each missense substitution. We compared our predictions against experimentally measured transactivation activity (yeast assays) to evaluate their accuracy. Finally, the prediction results were compared with those obtained by the program Sorting Intolerant from Tolerant (SIFT) and Dayhoff’s classification.
AUTO–MUTE / AUTOmated server for predicting functional consequences of amino acid MUTations in protEins
A collection of programs for predicting functional changes to proteins upon single residue substitutions, developed by combining structure-based features with trained statistical learning models. For each type of function prediction, a variety of classification and regression models have been developed and are available for researchers. These include Random Forest, Support Vector Machine (SVM), AdaBoostM1 combined with the C4.5 Decision Tree algorithm, as well as Tree and SVM regression. The trained classifiers provide instantaneous and reliable predictions regarding HIV-1 co-receptor usage, requiring only translated V3 loop genotypes as input. Furthermore, the novelty of these computational mutagenesis based predictor attributes distinguishes the models as orthogonal and complementary to previous methods that utilize sequence, structure, and/or evolutionary information.
CHASM / Cancer-specific High-throughput Annotation of Somatic Mutations
Utilizes machine learning to integrate missense mutation context at multiple scales. CHASM uses the Random Forest algorithm to discriminate somatic missense mutations (referred to hereafter as missense mutations) as either cancer drivers or passengers. This program can also serve for evaluating the statistical significance of cancer type-specific predictions for each of 32 cancer types from the Cancer Genome Atlas (TCGA), and pan-cancer predictions for all TCGA cancer types in aggregate.
CUPSAT / Cologne University Protein Stability Analysis Tool
A tool to predict changes in protein stability upon point mutations. The prediction model uses amino acid-atom potentials and torsion angle distribution to assess the amino acid environment of the mutation site. Additionally, the prediction model can distinguish the amino acid environment using its solvent accessibility and secondary structure specificity.
Provides a fast and quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes.
HOPE / Have yOur Protein Explained
Serves for automatic mutant analysis. HOPE is an online tool that furnishes information about disease related phenotype caused by mutations in human proteins. For performing, this tool collects data from sources such as the protein’s 3D structure and the UniProt database of well-annotated protein sequences. This program works in three steps: users (1) enter an input sequence; (2) select a residue to mutate; and (3) select mutation.
A web tool for genome-wide annotation of human SNPs. LS-SNP/PDB provides information useful for identifying amino-acid changing SNPs (nsSNPs) that are most likely to have an impact on biological function. The system is kept up-to-date by an automated, high-throughput build pipeline that systematically maps human nsSNPs onto Protein Data Bank structures and annotates several biologically relevant features, inferred from three-dimensional experimental structures.
MAPP / Multivariate Analysis of Protein Polymorphism
Quantifies the physicochemical variation in each column of a multiple sequence alignment and calculates the deviation of candidate amino acid replacements from this variation. The greater the deviation, the higher is the probability that a replacement impairs the function of the protein, and the greater is its predicted effect on the function of the protein.
mCSM / mutation Cutoff Scanning Matrix
Predicts the impact of single-point mutations on protein stability and protein–protein and protein–nucleic acid affinity. mCSM is an approach, which relies on graph-based signatures, for studying the impact of missense mutations in proteins. The software perceives residue environment density and depth implicitly, without relying on direct calculations or thresholds. It was applied to predict stability changes of mutations occurring in p53, demonstrating its applicability in a challenging disease scenario.
Predicts the functional impact of amino-acid substitutions in proteins. Mutationassessor employs information based on the analysis of evolutionary conservation patterns in protein family multiple sequence alignments. It has been validated on a large set of disease associated and polymorphic variants. This tool enables the determination of mutations discovered in cancer or missense polymorphisms.
PANTHER / Protein ANalysis THrough Evolutionary Relationships
A widely used online resource for comprehensive protein evolutionary and functional classification, and includes tools for large-scale biological data analysis. The latest version of PANTHER, 10.0, includes almost 5000 new protein families (for a total of over 12 000 families), each with a reference phylogenetic tree including protein-coding genes from 104 fully sequenced genomes spanning all kingdoms of life. Phylogenetic trees now include inference of horizontal transfer events in addition to speciation and gene duplication events. Functional annotations are regularly updated using the models generated by the Gene Ontology Phylogenetic Annotation Project.
PolyPhen / Polymorphism Phenotyping
Predicts the possible impact of an amino acid substitution on the structure and function of a human protein. PolyPhen predicts the functional significance of an allele replacement from its individual features by a Naïve Bayes classifier. The web application allows users to (i) predict the effect of a single-residue substitution or reference single nucleotide polymorphism SNP, (ii) analyze SNPs in a batch mode, and (iii) search in a database of precomputed predictions for the whole human exome sequence space.
SIFT / Sorting Intolerant From Tolerant
Determines if an amino acid substitution is deleterious to protein function. SiFT can be employed to prioritize nonsynonymous or missense variants. It is able to deal with protein conservation with homologous sequences and the severity of the amino acid change. This tool can be applied to human genome and nonhuman organisms. It can run on a large number of protein sequences using a single graphical processing unit.
StSNP / Structure SNP
Permits researchers to conduct visual inspection of the possible effects of the nonsynonymous single nucleotide polymorphisms (nsSNPs) in protein structure. StSNP provides a comparative modeling of the wild-type and mutated proteins along with real-time analysis and visualization of structures and sequences. It is useful for studies involving the metabolic pathways and the diseases associated with a particular SNP.
VEP / Variant Effect Predictor
Determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. Simply input the coordinates of your variants and the nucleotide changes to find out the genes and transcripts affected by the variants, location of the variants (e.g. upstream of a transcript, in coding sequence, in non-coding RNA, in regulatory regions), consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift); known variants that match yours, and associated minor allele frequencies from the 1000 Genomes Project, SIFT and PolyPhen scores for changes to protein sequence.
A computational method for identifying ‘active’ sites in proteins (signalling sites, protein domains, regulatory motifs) that are specifically and significantly mutated in cancer genomes. ActiveDriver provides signalling-related interpretation of single nucleotide variants (SNVs) identified in cancer genome sequencing. ActiveDriver is based on a gene-centric logistic regression model that considers multiple factors in estimating significance of mutation enrichment (or depletion) in active sites. The factors include mutation frequency, distribution of active sites in protein sequence, their position with respect to mutations (direct and flanking), and structured and disordered regions of proteins.
Implements gene and gene-set level analysis methods for somatic mutation studies of cancer. The gene-level methods distinguish between driver genes (which play an active role in tumorigenesis) and passenger genes (which are mutated in tumor samples, but have no role in tumorigenesis) and incorporate a two-stage study design. The gene-set methods implement a patient-oriented approach, which calculates gene-set scores for each sample, then combines them across samples; a gene-oriented approach which uses the Wilcoxon test is also provided for comparison.
Allows users to determine if particular changes are likely to be cancer-associated. The impact of each change is measured using two known methods: Sorting Intolerant From Tolerant (SIFT) and the Pfam-based LogR.E-value metric. A third method, the Gene Ontology Similarity Score (GOSS), provides an indication of how closely the gene in which the variant resides resembles other known cancer-causing genes. Scores from these three algorithms are analyzed by a random forest classifier which then predicts whether a change is likely to be cancer-associated.
CAROL / Combined Annotation scoRing toOL
A combined functional annotation score of non-synonymous coding variants. CAROL combines information from PolyPhen-2 and SIFT. We use a weighted Z method to derive the combined score. We calibrate and validate CAROL using positive (known disease-causing) and negative (postulated non-disease-causing) control variants. CAROL has high predictive power for the effect of ns variants and has the distinct advantage of high coverage, i.e. low missing data rates. The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
ChroMoS / Chromatin Modified SNPs
Combines genetic and epigenetic data with the goal of facilitating SNPs’ classification, prioritization and prediction of their functional consequences. ChroMoS uses a large database of SNPs and chromatin states, but allows a user to provide his/her own genetic information. Based on the SNP classification and interactive prioritization, a user can compute the functional impact of multiple SNPs using two prediction tools, one for differential analysis of transcription factor binding (sTRAP) and another for SNPs with potential impact on binding of miRNAs (MicroSNiPer).
Condel / CONsensus DELeteriousness score of missense SNVs
Evaluates the probability that a set of missense single nucleotide variants (SNVs) is deleterious. Condel integrates the output of different methods and can be applied to any array. It computes a weighted approach of missense mutations from the complementary cumulative distributions of scores of deleterious and neutral mutations. This tool can provide some insight into the impact of the mutation on the biological activity of the proteins.
CRAVAT / Cancer-Related Analysis of Variants Toolkit
Performs cancer-related analysis of variants. CRAVAT returns mutation interpretations in a dynamic interactive web environment for sorting, visualizing and inferring mechanism. The software (i) performs all projecting and assigns sequence ontology, (ii) predicts mutation impact using multiple bioinformatics classifiers normalized, (iii) allows for joint prioritization of all non-silent mutation types, organizes annotation from many sources on graphical displays of protein sequence and 3D structure, and (iv) facilitates dynamic filtering. It is suitable for both large and small studies and developed for easy integration with other software.
DMI / Driver Mutation Identification
Helps identify cancer-associated ‘driver’ mutations from ‘passenger’ ones in a cancer genome. Generally, given a set of mutations, the DMI system could identify which of them are drivers, in the sense that whether they have functional impact on the involved protein.
DrGaP / Driver Genes and Pathways
A powerful and flexible statistical framework for identifying driver genes and driver signaling pathways in cancer genome-sequencing studies. DrGaP is immediately applicable to cancer genome-sequencing studies and will lead to a more complete identification of altered driver genes and driver signaling pathways in cancer.
A pipeline for ranking nonsynonymous single nucleotide variants given a specific phenotype. eXtasy takes into account the putative deleteriousness of the variant, haploinsufficiency predictions of the underlying gene and the similarity of the given gene to known genes in the given phenotype.
InVEx / Introns Vs Exons
Aims to leverage intron and untranslated (UTR) sequences in a gene locus. InVEx is a permutation-based framework that determines genes with a somatic mutation distribution, showing evidence of positive selection for non-silent mutations. This method was developed to assist user in cancer genomics studies, especially with particular relevance to high mutation rate cancers.
MuSiC / Mutational Significance In Cancer
A set of tools aimed at determining the significance of somatic mutations discovered within a given cohort of cancer samples, incorporating the cohort’s alignment data, variant lists and any relevant clinical data. The development of MuSiC was motivated by the rapidly expanding numbers of mutation data sets from a wide variety of tumor types. It is imperative during post-discovery analysis to separate the significant, or “driver,” mutations from the passenger mutations to more accurately pinpoint the key genes and pathways critical for disease initiation and progression. MuSiC is designed precisely to streamline this process into an easily accessible high-throughput software exercise.
Assists researchers to perform evaluation of the pathogenic potential of DNA sequence alterations. MutationTaster is an online application that aims to determine the functional consequences of amino acid substitutions, short insertion and/or deletion (indel) mutations, variants spanning intron-exon borders, intronic and synonymous alterations. Moreover, this tool is able to categorize confirmed polymorphisms and known disease mutations.
A web application tool developed to classify an amino acid substitution as disease-associated or neutral in human. In addition, MutPred predicts molecular cause of disease. The tool requires a protein sequence, a list of amino acid substitutions, and an email address.
MutSig / Mutation Significance
Analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes. MutSig was originally developed for analyzing somatic mutations, but it has also been useful in analyzing germline mutations. MutSig builds a model of the background mutation processes that were at work during formation of the tumors, and it analyzes the mutations of each gene to identify genes that were mutated more often than expected by chance, given the background model.
A tool to predict whether a nonsynonymous single nucleotide polymorphism (nsSNP) has a phenotypic effect. nsSNPAnalyzer also provides additional information about the SNP to facilitate the interpretation of results, e.g., structural environment and multiple sequence alignment. It uses information contained in the multiple sequence alignment and information contained in the 3D structure to make predictions.
An approach to uncover driver genes or gene modules. It computes a metric of functional impact using three well-known methods (SIFT, PolyPhen2 and MutationAssessor) and assesses how the functional impact of variants found in a gene across several tumor samples deviates from a null distribution. It is thus based on the assumption that any bias towards the accumulation of variants with high functional impact is an indication of positive selection and can thus be used to detect candidate driver genes or gene modules.
PhD-SNP / Predictor of human Deleterious Single Nucleotide Polymorphisms
A method based on support vector machines (SVMs) that starting from the protein sequence information can predict whether a new phenotype derived from a nsSNP can be related to a genetic disease in humans. Using a dataset of 21,185 single point mutations, 61% of which are disease-related, out of 3,587 proteins, we show that our predictor can reach more than 74% accuracy in the specific task of predicting whether a single point mutation can be disease related or not.
Performs pathology predictions, gives access to a repository of pre-calculated predictions and generates and validates new predictors. PMut is a web-based tool that offers a generally trained predictor performing with current available methods and allows user to access an automatic procedure to train new predictors with specific datasets or features. The software was trained using the manually curated variation database SwissVar.
A cross-platform Java application toolkit to prioritize variants (SNVs and InDels) from exome or whole genome sequencing data by using different filtering strategies and information of external databases. PriVar contains four modules: annotation, quality control, candidate gene identification and prediction of functional impact of variants.
SAPRED / SAP Disease-Association Predictor
Offers the researchers an automatic pipeline to predict the disease-association of single amino acid polymorphisms (SAPs). Through a strict protein-level 5-fold cross-validation, SAPRED attained an overall accuracy of 82.61%, and an MCC of 0.60. A web server was developed to provide a user-friendly interface for biologists.
SNAP / Screening for Non-Acceptable Polymorphisms
A neural-network based tool to be used for the evaluation of functional effects single amino acid substitutions in proteins. SNAP utilizes various biophysical characteristics of the substitution, as well as evolutionary information, some predicted—or when made available observed—structural features, and possibly annotations, to predict whether or not a mutation is likely to alter protein function (in either direction: gain or loss). SNAP identifies over 80% of the non-neutral mutations at 77% accuracy and over 76% of the neutral mutations at 80% accuracy at its default threshold.
Assigns molecular functional effects of non-synonymous SNPs based on structure and sequence analysis. There are three unique features of the SNPs3D resource. First, it is designed specifically for the analysis of the relationship between SNPs and disease. Second, it constructs gene networks based on conceptual relationships derived from the literature, rather than experimental data. Third, it integrates access to all available and relevant information sources, wherever possible giving the user easy access to the underlying data and literature, so that informed judgments can be made.
transFIC / transformed Functional Impact score for Cancer
Evaluates the likelihood that a specific somatic mutation is a cancer driver. transFIC can give the transformed functional impact scores in cancer of somatic cancer non-synonymous single nucleotide variants (nsSNVs). It is useful to transform scores that prioritize driver-like nsSNVs. This tool can take into account the differences in baseline tolerance to nsSNVs between protein families.
An efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others). Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day.
Assists users in the annotation of multiple types of human genomic variants in a high-throughput setting. AnnTools is an integrative approach not limited by the specific type or genomic location of the variant, the ability to evaluate both functional and regulatory regions and incorporation of variant frequency data to highlight potentially significant variants. This application allows individual and batch query processing and offers flexibility in input data formats.
Allows users to annotate consequence terms of variants. ASOoViR exploits Ensembl gene sets with an included feature to annotate whole genome scale calls. This program is composed of multiple modules and can handle single nucleotides polymorphism (SNP), structural variants (SVs) as well as insertions and deletions. This application can perform on both transcript-level basis and gene basis and produces customizable outputs.
AVIA / Annotation, Visualization, and Impact Analysis
While the functionality of AVIA v1.0, whose implementation was based on ANNOVAR, was comparable to other annotation web servers, AVIA v2.0 represents an enhanced web-based server that extends genomic annotations to cell-specific transcripts and protein level functional annotations. With AVIA’s improved interface, users can better visualize their data, perform comprehensive searches, and categorize both coding and non-coding variants.
A software tool that determines the linkage disequilibrium (LD) region around a significant SNP from a GWAS. CandiSNPer provides a list with functional annotation and LD values for the SNPs found in the LD region. This list contains not only the SNPs for which genotyping data are available, but all SNPs with rs-IDs, thus increasing the likelihood to include the causal variant. Furthermore, plots showing the LD values are generated. CandiSNPer facilitates the preselection of candidate SNPs for causal variants.
Annotates variants derived from high-throughput sequencing (HTS) experiments. CHAoS is a set of scripts that proposes a wide range of features for annotating with: (i) motifs predictions; (ii) transcript information; (iii) external discrete or continuous signal file. Additionally, it furnishes options to enable the generation of gene locus repository and an utility for variant naming standardization.
Serves as a comparison and annotation tool for next-generation sequencing (NGS). COVA allows to compare variants among multiple samples, annotate structural variants and handle multiple codon tables. This software permits to detect causal variations relating to phenotype by allowing annotation for effects of variants on genes and by comparing those among multiple samples.
FastSNP / Function Analysis and Selection Tool for Single Nucleotide Polymorphisms
Allows users to efficiently identify and prioritize high-risk SNPs according to their phenotypic risks and putative functional effects. A unique feature of FASTSNP is that the functional effect information used for SNP prioritization is always up-to-date, because FASTSNP extracts the information from 11 external web servers at query time using a team of web wrapper agents.
GERP / Genomic Evolutionary Rate Profiling
Performs identification of constrained elements in multiple alignments, by quantifying substitution deficits. GERP is a bottom-up method for constrained element detection that identifies sites under evolutionary constraint, i.e., sites that show fewer substitutions than would be expected to occur during neutral evolution. The software then aggregates these sites into longer, potentially functional sequences called constrained element. It is suitable for high-throughput analysis of genomic data.
GESND / Genetic Screening and Diagnosis
Detects causal mutations for rare congenital diseases by next-generation sequencing. GESND provides features allowing detection of wide-spectrum variants, annotation and filtering variants, and prioritization of candidate variants.
HSF / Human Splicing Finder
Predicts the effects of mutations on splicing signals. HSF can forecast the disruption of the natural splice sites and is able to identify splicing motifs in any human sequence. This software combines more than 10 algorithms based on either position weight matrices (PWM), maximum entropy principle or motif comparison method. The PWM evaluates also the strength of 5′ and 3′ splice sites and branch points.
Provides a suite of scripts to annotate single-nucleotide polymorphisms (SNPs) identified by the sequencing of whole genomes with reference sequences in Ensembl. NGS-SNP includes additional components to merge, filter and sort SNP lists and scripts that enable to obtain reference chromosome and transcript sequences from Ensembl, able to work with SNP discovery tools like Maq or SAMtools. It can serve to classify SNPs as synonymous, non-synonymous, 3’-UTR and others.
A tool for annotating genomic point mutations and short nucleotide insertions/deletions (indels) with variant- and gene-centric information relevant to cancer researchers. This information is drawn from 14 different publicly available resources that have been pooled and indexed. Annotations linked to variants range from basic information, such as gene names and functional classification (e.g. missense), to cancer-specific data from resources such as the Catalogue of Somatic Mutations in Cancer (COSMIC), the Cancer Gene Census, and The Cancer Genome Atlas (TCGA).
PHAST / PHAge Search Tool
Assists users to visualize, identify and annotate prophage sequences within bacterial genomes or plasmids. PHAge Search Tool performs identification using either raw or annotated bacterial genome sequence data. The main features are: (1) prophage region identification support for both raw nucleotide sequence input; (2) support for detailed prophage annotation including position, length, boundaries, number of genes, and attachment sites; and (3) support for the prediction of the completeness or potential viability of identified prophages.
SiPhy / Site-specific PHYlogenetic analysis
Identifies bases under selection from multiple alignment data via rigorous implemented statistical tests. SiPhy is incorporated into a hidden Markov model (HMM) to segment sequences into constrained and non-constrained regions. This software depends heavily on the underlying alignment and it exploits deeply sequenced phylogenies to assess both unlikely substitution patterns and slowdowns/accelerations in mutation rates.
A simple and easy to use high through-put analysis tool which can provide comprehensive annotation of both novel and known SNPs for any organism with a draft sequence and annotation. It is especially intended for use by researchers with limited bioinformatic experience.
Annotates and predicts the effects of single nucleotide polymorphisms (SNPs). SnpEff features include: (1) the ability to make thousands of predictions per second; (2) the ability to add custom genomes and annotations; (3) the ability to integrate with Galaxy (4) compatibility with multiple species and multiple codon usage tables, (5) integration with Broad’s Genome Analysis Toolkit (GATK) and (6) the ability to perform non-coding annotations. It enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
Assists in selecting functionally relevant single nucleotide polymorphisms (SNPs) for large-scale genotyping studies of multifactorial disorders. SNPnexus allows researchers to assess the potential significance of candidate sequence variants and points to the altered gene/protein isoforms that may lead to phenotypic changes. It also allows single queries using dbSNP identifiers or chromosomal regions for annotating known variants.
SPOT / SNP Prioritization Online Tool
A web site for integrating biological databases into the prioritization of single nucleotide polymorphisms (SNPs) for further study after a genome-wide association study (GWAS). Typically, the next step after a GWAS is to genotype the top signals in an independent replication sample. Investigators will often incorporate information from biological databases so that biologically relevant SNPs, such as those in genes related to the phenotype or with potentially non-neutral effects on gene expression such as a splice sites, are given higher priority.
SVA / Sequence Variant Analyzer
Provides assistance for annotation, visualization and analysis of genetic variants through next-generation sequencing (NGS) studies including whole genome sequencing (WGS) and exome sequencing studies. Sequence Variant Analyzer suits for connecting bioinformatics annotations with statistical association tests to highlight suggestive association evidence to catch the user’s attention. It permits also to filter by function and calling quality control (QC).
VAGrENT / Variation Annotation GENeraTor
A suite of perl modules that compares genomic variations with reference genome annotations and generates the possible effects each variant may have on the transcripts it overlaps. VAGrENT evaluates each variation/transcript combination and describes the effects in the mRNA, CDS and protein sequence contexts. It provides details of the sequence and position of the change within the transcript/protein as well as Sequence Ontology terms to classify its consequences.
VARIANT / VARIant ANalysis Tool
Reports the functional properties of any variant in all the human, mouse or rat genes, and the corresponding neighborhoods. VARIANT is a web application that indicates the functional effects in the coding regions and also analyzes noncoding single nucleotide variants (SNVs) situated both within the gene and in the neighborhood that could affect different regulatory motifs, splicing signals, and other structural elements.
Aims to facilitate web-based personal genome annotation. wANNOVAR is a web application that uses ANNOVAR as the backend annotation engine. Users need to submit a list of variants, and the server can process the submission and generate HTML-based result pages. It also allows flexibility by permitting the users to select customized filtering criteria and identify a subset of prioritized variants from thousands or even millions of input variants.
Combines an algorithm designed to cluster haplotypes of interest from a given set of haplotypes with two existing tools: Haploview, for analyses of linkage disequilibrium blocks and haplotypes, and PLINK, to generate all possible diplotypes from given genotypes of samples and calculate linear or logistic regression. In addition, procedures for generating all possible diplotypes from the haplotype clusters and transforming these diplotypes into PLINK formats were implemented. Diplotyper is a fully automated tool for performing association analysis based on diplotypes in a population. Diplotyper is useful for identifying more precise and distinct signals over single-locus tests.
A tool for imputation of unobserved genotypes using a set of reference haplotype panel at a higher-density SNP set such as HapMap, and lower-density genotypes of a target individual using such as genotyping arrays.
An algorithm for haplotype assembly of densely sequenced human genome data. The HapCompass algorithm operates on a graph where single nucleotide polymorphisms (SNPs) are nodes and edges are defined by sequence reads and viewed as supporting evidence of co-occurring SNP alleles in a haplotype.
Allows haplotype assembly for diverse sequencing technologies. HapCUT can assemble haplotypes for a diverse array of data modalities. It implements an approach for modeling and estimating h-trans error probabilities de novo that reduce errors in assembled Hi-C haplotypes. It was assessed using data from fosmid-based dilution pool sequencing, 10X Genomics linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation sequencing.
HARSH / HAplotype inference using Reference and Sequencing tecHnology
An efficient method that combines multi-SNP read information with reference panels of haplotypes for improved genotype and haplotype inference in sequencing data. Unlike previous phasing methods that use read counts at each SNP as input, our method takes into account the information from reads spanning multiple SNPs. HARSH is able to efficiently find the likely haplotypes in terms of the marginal probability over the genotype data. Using simulations from HapMap and 1000 Genomes data, we show that our method achieves superior accuracy than existing approaches with decreased computational requirements.
HATS / Haplotype Amplification in Tumor Sequences
A tool that calls the amplified alleles, and thus amplified haplotype, in copy number aberration regions in next-generation sequencing tumor data. The amplified haplotype may reveal gene variants. We assess the performance of HATS using simulated amplified regions generated from varying copy number and coverage levels, followed by amplicons in real data. We demonstrate that HATS infers the amplified alleles more accurately than does the naive approach, especially at low to intermediate coverage levels and in cases (including high coverage) possessing stromal contamination or allelic bias.
An algorithm for discovering long shared segments of Identity by Descent (IBD) between pairs of individuals in a large population. GERMLINE takes as input genotype or haplotype marker data for individuals (as well as an optional known pedigree) and generates a list of all pairwise segmental sharing. GERMLINE uses a novel hashing & extension algorithm which allows for segment identification in haplotype data in time proportional to the number of individuals.
Identifies very short identity by descent (IBD) segments that are tagged by rare variants in large sequencing data. HapFABIA works in 3 steps: (i) it applies factor analysis for bicluster acquisition (FABIA) biclustering to phased or unphased genotype data, (ii) it extracts IBD segments from FABIA models by distinguishing IBD from identical by state (IBS) without IBD, and (iii) it cuts single nucleotide variants (SNVs) with false correlation from the extracted IBD segments and joins segments.
A C++ program for multipoint IBD estimation based on high density SNP genotype data.
MORGAN / MOnte caRlo Genetic ANalysis
Infers co-ancestry in population samples in the presence of linkage disequilibrium. MORGAN can sample inheritance patterns at all marker locations conditional on all marker data observed in the pedigree data set. This tool requires a list of realizations of founder-genome-label (FGL) of all individuals at all marker locations across a chromosome.
A free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
This method estimates the probability of sharing alleles identity by descent (IBD) across the genome and can also be used for mapping disease loci using distantly related individuals.
Is the intelligent pedigree drawing tool that’s designed to work the way you think. Cyrillic is a powerful program that brings together all the tools you need for drawing family pedigrees, and managing and analyzing pedigree data.
A pedigree-drawing application with special features for easy visualization of complex haplotype information. HaploPainter has been developed to facilitate gene mapping in mendelian diseases in terms of fast and reliable definition of the smallest critical interval harbouring the underlying gene defect. Features like haplotype compression and the ability of marker section cut-out are particularly helpful for viewing large SNP-derived haplotypes.
Allows users to handle large and complex pedigrees with an emphasis on readability and aesthetics. Madeline includes a desktop version and a web version permitting researchers to create, display, save and print pedigrees. Moreover, it can convert pedigree and marker data into various formats required by linkage analysis software and also supply functionality for querying pedigree data sets interactively.
A software package that facilitates creation and verification of pedigrees within large genealogies.
Is able to draw pedigrees with complex inbreeding structures over multiple generations in a population with a large number of individuals, as is common in animal populations. Options include: full pedigree, summarization, extraction of individual pedigrees, inbreeding calculation, coancestry coefficient calculation, color control, drawing size, page size and margins, drawing styles.