A Macintosh OS X application that provides for creation, editing and drawing of pedigrees (also called family trees, or genograms) of human or non-human extended family lineages.
A software tool for visualizing phenotypic and genotypic data for related individuals linked in pedigrees.
Pelican / Pedigree Editor for LInkage Computer ANalysis
It is a utility for graphically editing the pedigree data files used by programs such as FASTLINK, VITESSE, GENEHUNTER and MERLIN.
A drawing software solution used by research institutions and clinical genetic services worldwide since 1996.
Infers tumor purity and malignant cell ploidy directly from analysis of somatic DNA alterations. ABSOLUTE can detect subclonal heterogeneity, somatic homozygosity, and calculate statistical sensitivity to detect specific aberrations. It provides a foundation for integrative genomic analysis of cancer genome alterations on an absolute (cellular) basis. It may be possible to extend ABSOLUTE to other types of genomic alterations, such as structural rearrangements and small insertions/deletions.
ExPANdS / Expanding Ploidy and Allele Frequency on Nested Subpopulations
Assists users in characterizing coexisting subpopulations (SPs) in a tumor. ExPANdS is a method that uses copy number and allele frequencies derived from exome or whole genome sequencing (WGS) input data. In addition to tumor purity, this method predicts the number of clonal expansions, the size of the resulting SPs in the tumor bulk and the mutations specific to each SP.
HIVCD / HIV Contamination Detection
Detects potential contamination events. HIVCD is a web-based application that consists of the multiple sequence alignment program MAFFT and a percent identity (PID) calculator. The software automates the process of contamination screening using sequence data. It is able to compare 24 sequences from a run to each other and several thousand previous sequences. HIVCD can be incorporated into laboratory workflow and is useful for identifying situations that may compromise patient care during HIV-1 resistance testing.
A highly scalable parallel program to identify non-host sequences (of potential pathogen origin) and estimate their genome relative abundance in high-throughput sequence datasets. READSCAN accurately classified human and viral sequences on a 20.1 million reads simulated dataset in <27 min using a small Beowulf compute cluster with 16 nodes.
RINS / Rapid Identification of Non-human Sequences
Discovers the presence of pathogens from a custom reference in high-throughput sequencing datasets. RIND is designed for mate-paired high-throughput sequencing data with reads at least 36 bp and up to 500 bp. It is useful for sequencing data from any species. This tool enables the discovery of sequencing reads from intact or mutated non-human genomes in a dataset and constructs contigs with these non-human sequences.
VFS / ViralFusionSeq
Detects and annotates genome-wide viral fusions and breakpoints. VFS proposes a method that exploits high-throughput sequencing (HTS) for performing analysis at single-base resolution. This application can also recreate fusion transcripts leaning on a combination of clipped sequence and discordant read pair (RP) information. Its analysis can be applied to both DNA and RNA-seq information.
ViReMa / Viral Recombination Mapper
Identifies and reports recombination or fusion events in virus genomes by exploiting deep sequencing datasets. ViReMa iteratively calls the read alignment software named Bowtie to try and map all portions of a candidate read. This program can execute analysis of datasets containing reads of variable lengths and can deal with reference genomes of any size.
Describes intra-host viruses through next generation sequencing (NGS) data. VirusFinder retrieves virus infection, co-infection with multiple viruses, mutations in the virus genomes, in addition to virus integration sites in host genomes. It reports novel contigs, long sequences assembled from short reads that map neither to the host genome nor to the genomes of known viruses. This tool deals with both paired-end and single-end data.
An algorithmic tool for detecting known viruses and their integration sites using next-generation sequencing of human cancer tissue. It takes FASTQ files (paired-end reads) as input.
This software aims at reconstructing haplotypes from next-generation sequencing data. A typical application example is to reconstruct HIV-1 haplotypes present in a blood sample from a patient based on a set of reads.
A jumping hidden Markov model that describes the generation of the viral quasispecies and a method to infer its parameters by analysing next generation sequencing data.
A program for viral quasispecies reconstruction, specifically developed to analyze long read (>100 bp) NGS data.
ShoRAH / Short Reads Assembly into Haplotypes
A computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. This approach provides the user also with an estimate of the quality of the reconstruction. Further, ShoRAH can reconstruct the global haplotypes and estimate their frequencies. ShoRAH was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.
V-Phaser 2
Serves for inferring intra-host diversity within viral populations. V-Phaser 2 adds three major methodologies to the state of the art: (1) a technique to utilize paired end read data for calling phased variants; (2) a strategy to represent and infer length polymorphisms; and (3) an in-line filter for erroneous calls arising from systematic sequencing artifacts. It utilizes paired reads in phasing, extending the distance between phased sites from a read length to a fragment length.
CMA / Comprehensive Meta-analysis
A package to do meta-analysis which works in a spreadsheet interface and also provides forest plots, which are useful for visualizing between-study heterogeneity. The program combines ease of use with a wide array of computational options and sophisticated graphics.
An R library for genome-wide association (GWA) analysis. GenABEL implements effective storage and handling of GWA data, fast procedures for genetic data quality control, testing of association of single nucleotide polymorphisms with binary or quantitative traits, visualization of results and also provides easy interfaces to standard statistical and graphical procedures implemented in base R and special R libraries for genetic analysis.
GWAMA / Genome-Wide Association Meta-Analysis
Performs meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. GWAMA can be used for analysing the results of all different genetic models (multiplicative, additive, dominant, recessive). It incorporates error trapping facilities to identify strand alignment errors and allele flipping, and performs tests of heterogeneity of effects between studies.
MAGENTA / Meta-Analysis Gene-set Enrichment of variaNT Associations
A computational tool that tests for enrichment of genetic associations in predefined biological processes or sets of functionally related genes, using genome-wide genetic data as input. MAGENTA is designed to analyze datasets for which genotype data are not readily available, such as large genome-wide association study (GWAS) meta-analyses. It can be used either (i) to test a specific hypothesis or (ii) to generate hypotheses by testing a range of known biological gene sets.
Offers a set of methods for the meta-analysis of genome-wide single nucleotides polymorphisms (SNP) association results. MetABEL is a package composed of three main functions: (i) one dedicated to the generation of forest plots; (ii) one which is able to perform a meta-analysis of results derived from several genome wide associations studies (GWAS); (iii) and one that offers a feature for the pairwise meta-analysis of results from GWAS.
A method for meta-analysis of case-control genetic association studies using random-effects logistic regression. The main advantages of the proposed methodology is its flexibility and the ease of use, while at the same time covers almost every aspect of a meta-analysis providing overall estimates without the need of multiple comparisons.
METAL / Meta Analysis Helper
Provides a computationally efficient tool for meta-analysis of genome-wide association scans, which is a commonly used approach for improving power complex traits gene mapping studies. METAL provides a rich scripting interface and implements efficient memory management to allow analyses of very large data sets and to support a variety of input file formats.
Provides functions for conducting meta-analyses in R. The package includes functions for fitting the meta-analytic fixed- and random-effects models and allows for the inclusion of moderators variables (study-level covariates) in these models. The package provides various plot functions (for example, for forest, funnel, and radial plots) and functions for assessing the model fit, for obtaining case diagnostics, and for tests of publication bias.
A general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. MetaSKAT can carry out meta-analysis of SKAT, SKAT-O and burden tests with individual level genotype data or gene level summary statistics.
PheWAS / Phenome-wide association study
Tests associations between a genetic variant or clinical factor of interest and a compendium of clinical outcomes or phenotypes. PheWAS finds disease-gene associations using ICD9 billing codes. It can serve to construct hypotheses around a variety of exposures. This tool implements a propensity score (PS) method for PS matching or PS adjustment. It employs a penalized maximum likelihood (PML) in logistic regression that can reduce bias in the parameter estimates.
Allows meta-analysis of rare variant association studies for quantitative traits. RAREMETAL works in two steps: (i) analysis of individual studies and (ii) generation of summary statistics that can later be combined across studies. The software enables users to customize variant groupings for gene-level statistics at the meta-analysis stage, after individual studies are analyzed. It can be used in large meta-analyses of rare variants for a variety of traits, ranging from blood lipids levels, anthropometric traits to smoking and drinking.
RevMan / Review Manager
A package that does meta-analysis and provides results in tabular format and graphically. In addition to intervention reviews, the new version includes the ability to write diagnostic test accuracy reviews, methodology reviews and overviews of reviews.
Provides functions for simple fixed and random effects meta-analysis for two-sample comparisons and cumulative meta-analyses. rmeta is a package that draws standard summary plots, funnel plots, and computes summaries and tests for association and heterogeneity. It offers commands to realize meta-analysis of antibacterial catheter coating, or to access cumulative meta-analysis of binary data.
A cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress.
Mimics studies assisting to design data collection modalities of the 1000 Genomes Project. ART builds ‘synthetic’ sequencing reads in a manner that feigns the technology-specific sequencing process. It is able to generate sequencing data with customized read length and error characteristics. This tool supports all three types of common sequencing errors: base substitutions, insertions and deletions.
Takes the reference genome (in FASTA format) as input and outputs artificial FASTQ files in the Sanger format. It can accept Phred base quality scores from existing FASTQ files, and use them to simulate sequencing errors. Since the artificial FASTQs are derived from the reference genome, the reference genome provides a gold-standard for calling variants (Single Nucleotide Polymorphisms (SNPs) and insertions and deletions (indels)).
Serves as a whole genome simulator for next-generation sequencing (NGS). DWGSIM is built on WGSIM from SAMtools and can deal with ABI SOLiD and Ion Torrent data and execute various assumptions about aligners and positions of indels. This software produces base error qualities based on a parametric model. It can also simulate reads and evaluate mappings.
The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Flowsim is a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences.
A set of tools used to create diploid fasta files with containing SNPs, indels, duplications, deletions, and translocations. These FASTA files can then be used in conjunction with next-generation sequencing simulators to artificially create sequencing experiments. The utility of these tools are to assess the performance and reliability of data analysis in next-generation sequencing pipelines.
Simulates Illumina, 454 and Sanger reads. Mason is useful for read mapping, read correction and transcript quantification. Its features include position specific error rates and base quality values. This tool has been written with performance in mind and can sample reads from large genomes.
PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries. PBSIM simulates those PacBio reads by using either a model-based or sampling-based simulation.
pIRS / profile based Illumina pair-end Reads Simulator
Simulates Illumina reads using empirical profiles. pIRS is a simulator developed to reproduce similar to those generated from the Illumina platform. This method be helpful for developing next-generation sequencing (NGS) software such as de novo assembly, single-nucleotide polymorphism (SNP) calling and structural variation detection. This application can also be useful for applications that need heterozygous data.
A flexible short read simulator. ShotGun generates sequence data with user-specified read length and average depth, accommodates to cycle specific sequencing error rates, allows the read depth distribution to be either the ideal Poisson or Negative Binomial to model the overdispersion observed with real sequencing data. In addition, ShotGun performs computationally efficient Single Nucleotide Polymorphism (SNP) discovery using a statistic aggregated across all sequenced samples.
simHTSD / simulate High-Throughput Sequencing Data
Produces a set of short nucleotide reads given a reference sequence. simHTSD simulates the output results with data provided by high-throughput DNA sequencers such as Illumina Genome Analyzer II and others.
An interactive program that integrates generation of rare variant genotype/phenotype data and evaluation of association methods using a unified platform. Variant data are generated for gene regions using forward-time simulation that incorporates realistic population demographic and evolutionary scenarios. Phenotype data can be obtained for both case–control and quantitative traits. SimRare has a user-friendly interface that allows for easy entry of genetic and phenotypic parameters. Novel rare variant association methods implemented in R can also be imported into SimRare, to evaluate their performance and compare results, e.g. power and Type I error, with other currently available methods both numerically and graphically.
An illumina paired-end and mate-pair short read simulator. This project attempts to model as many of the quirks that exist in Illumina data as possible. Some of these quirks include the potential for chimeric reads, and non-biotinylated fragment pull down in mate-pair libraries.
A targeted re-sequencing simulator that generates synthetic exome sequencing reads from a given sample genome. Wessim emulates conventional exome capture technologies, including Agilent's SureSelect and NimbleGen's SeqCap, to generate DNA fragments from genomic target regions. The target regions can be either specified by genomic coordinates or inferred from in silico probe hybridization. Coupled with existing next-generation sequencing simulators, Wessim generates a realistic artificial exome sequencing data, which is essential for developing and evaluating exome-targeted variant callers.
A small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms. Wgsim outputs the simulated polymorphisms, and writes the true read coordinates as well as the number of polymorphisms and sequencing errors in read names.
discoSnp++ / Discovering Single Nucleotide Polymorphism
Detects both heterozygous and homozygous isolated single nucleotide polymorphisms (SNPs) from any number of read datasets. discoSnp++ ranks predictions and outputs quality and coverage per allele to facilitate downstream genotyping analyses. It requires some computer resources, shows precise precision/recall values, and predictions are unlikely to be false positives.
A web-based tool, knowledgebase and community for analysis and interpretation of human variant files. GeneTalk provides an intuitive web-based interface for geneticists that analyze human sequence variants. It assists a clinical geneticist who is searching for information about specific sequence variants and connects this user to other users with expertise for the same sequence variant.
NECTAR / Non-synonymous Enriched Coding muTation Archive
Annotates disease-related and functionally important amino acids in human proteins. NECTAR enables users to interpret DNA variants identified in diagnostic or research sequencing. It collates disease-causing variants and functionally important amino acid residues from a number of sources. This tool is useful to retrieve functionally equivalent amino acid residues in evolutionarily related proteins.
Enables analysis of high-throughput sequencing data in post-alignment stages. Bamformatics provides programs for identifying variants and for computing various types genomic tracks.
Searches for single nucleotide polymorphisms (SNPs) with cloud computing. Crossbow is a Hadoop-based software tool that combines the speed of the short read aligner Bowtie with the accuracy of the SNP caller SOAPsnp to perform alignment and SNP detection for multiple whole-human datasets per day. The software achieves at least 98.9% accuracy on simulated datasets of individual chromosomes, and better than 99.8% concordance with the Illumina 1 M BeadChip assay of a sequenced individual.
Provides analysis workflow and quality metric management for DNA-seq experiments. draw-sneakpeek is a Java pipeline for Next-Generation Sequencing (NGS) data analysis. It was used to process whole-genome sequencing (WGS), whole-exome sequencing (WES) and targeted sequencing experiments on traditional high-performance computing clusters as well as on Amazon elastic compute cloud (EC2). This method is also available as Amazon machine images.
Allows users to analyze DNA-Seq data from next-generation sequenting (NGS) equipment such as Illumina, Roche/454, Proton, and Ion Torrent. GensearchNGS aids researchers to create different projects to group the analysis of various patients based on their type or their relation. It focuses on patients and their associated metainformation and permits, for each project, to have several raw sequencing data associated.
HugeSeq / High-throUghput GEnome SEQuencing
Automates the process of variant detection from alignment of genomic sequences. HugeSeq is an integrated computational pipeline that was developed to detect and annotate all types of genetic variations such as single nucleotide polymorphisms (SNPs), short insertions or deletions (indels) and larger structural variations (SVs). It can be used for other types of sequencing and for Illumina’s long mate-pair library.
Isaac Genome Alignment Software
Aligns next-generation sequencing (NGS) data with low-error rates (single or paired-ends). Isaac Genome Alignment Software has been designed to take full advantage of all the computational power available on a single server node. As a result, this toolscales over a broad range of hardware architectures, and alignment performance improves with hardware capabilities.
MutFinder / Mutation Finder
Streamlines the next generation sequencing data analysis using BFAST for aligner, SAMTOOLS for SNP caller, and ANNOVAR for annotation. MutFinder’s main goal is to identify mutations in DNA-seq.
Provides a method capable of discovering high-quality genomic variation in DNA sequence data. NGSpeAnalysis is a pipeline using open-source tools which can implement a set of pair ended Next-generation sequencing (NGS) analysis, include short reads alignment, high-quality variation genotype calling and variants annotation. It can be run both on a single workstation and in a cluster High Performance Computing (HPC) environment.
RTG Variant
Uses both relatedness and haplotype information and shows scalable method for large pedigree. RTG Variant encompasses distinct products for the specific needs of clinical search. This product is based on three modules: RTG Singleton (for a personalized medicine), RTG Family (for a reduction of parental sequencing) and RTG Population (for a simultaneous analysis).
Enables next-generation sequencing (NGS)-based resequencing analysis. reseqtools is a toolkit that integrates comprehensive functions for large scale NGS-based resequencing analysis, organized in eight major modules: Fatools, Fqtools, SOAPtools, Xamtools, CNStools, Vartools, Formtools and Gfftools. The software allows users to define custom pipelines for accommodating specific data processing other than resequencing analysis.
Allows to obtain biological insight into genetic events investigated by exome sequencing. Simplex is an automatized pipeline for investigating exome single-end (SE) and paired-end (PE) sequencing data generated by deep sequencing devices from Illumina and ABI SOLiD. The pipeline combines published and in-house developed applications and is continuously, automatically tested. Simplex is provided as a ready to use VirtualBox image and a fully configured Cloud image.
A Targeted RE-sequencing Annotation Tool that offers a comprehensive, open framework, end-to-end solution for analyzing and interpreting targeted re-sequencing data.
Performs a complete whole-exome sequencing pipeline and provides easy access through interface to intermediate and final results. A user can perform the whole analysis without knowing the underlying hardware and software architecture, dealing with both paired and single end data. The interface provides an easy and intuitive access for data submission and user-friendly web pages for annotated variant visualization.
ABACUS / Algorithm based on a BivAriate CUmulative Statistic
Identifies genotype–phenotype associations within predefined sets of single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWAS) studies. ABACUS analyzes SNPs with different minor allele frequency (MAF) in the same group, independently on the protective or causative effect of the minor frequency allele. The software can simultaneously consider common and rare variants and different directions of genotype effect. Applied to biological pathways, it gives an implicit functional characterization of trait-associated loci.
A computational pipeline for finding mutations relative to a reference sequence in short-read DNA re-sequencing data for microbial sized genomes. breseq reports single-nucleotide mutations, point insertions and deletions, large deletions, and new junctions supported by mosaic reads (such as those produced by new mobile element insertions) in an annotated HTML format.
FACIL genetic code prediction tool
Infers the genetic code directly from a newly sequenced, unannotated genome.
Provides assistance for analyzing and visualizing deep amplicon sequencing data. jMHC aims to deal with genotyping of major histocompatibility complex and can be appropriate to use for processing amplicons derived from other multigene families or for genotyping other polymorphic systems. It first extracts target sequences from all reads, including the complete tag sequence, and then generates a table which contains sequence variants and the number of reads, and lastly creates files with all sequence variants ordered.
Provides approaches for efficient exploration and management of phenotype data. Proper QC of phenotypes before proceeding to the association analysis is critical to ensure control of type I and II errors, reliable effect estimates and consistent results between studies. PhenoMan is highly beneficial for the preparation of qualitative and quantitative trait data for association studies using new datasets as well as those obtained from public repositories.
SHM / Somatic HyperMutation
Provides a targeting model that defines the mutation locations and a nucleotide substitution model that determines the resulting mutation. Somatic HypeMutation proceeds by specifying relative rates at which DNA motifs in the Ig sequence are mutated or the probability of each base mutating to each of the other three possibilities as a function of the surrounding bases.
Allows management of single nucleotide polymorphisms (SNPs) generated from haploid next generation sequencing (NGS) data. snp-search is composed of two fundamental features: it (1) creates a local SQLite database and schema, and (2) generates outputs requested objects from the database. The software provides detailed information about each SNP, and allows multiple SNPs filtering steps, including filtering on the function of genes within which the SNPs are found. The outputs can be used to test important biological hypotheses.
Identifies haplogroups from low coverage sequence data. YHap uses an imputation framework to jointly predict Y chromosome genotypes and assign Y haplogroups using low coverage population sequence data. Borrowing information across multiple samples within a population using an imputation framework enables accurate Y haplogroup assignment.
An algorithm for the correct alignment of two nucleotide sequences containing SVs, i.e. deletion, insertion, tandem duplication or inversion. The algorithm does not require the adjustment or modification of the alignment scoring scheme(s) that is usually tuned for a particular alignment purpose, e.g. cross-species, contig or read alignments. Thus, the algorithm can be universally applied in various biological studies relying on alignment.
Infers regions of loss of heterozygosity (LOH) from paired tumor-normal data. APOLLOH is a nonstationary hidden Markov model (HMM) that predicts regions of LOH in genome sequencing data of cancers. The software can complement the arsenal of computational tools designed for cancer focused sequencing studies. It was applied to 23 triple-negative breast cancer genomes sequenced to about 30x coverage on two massively parallel sequencing platforms.
A Perl/C++ package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads. BreakDancer sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.
Determines the location of breakpoints of structural variants (SVs) from single-end reads produced by next-generation sequencing (NGS). Breakpointer locates SV breakpoints by analyzing both misalignment artifacts and local non-uniform read distribution created by SVs. The software is designed as a supportive breakpoint discovery tool that ideally should be used in combination with other methods for genotyping SVs.
MATE-CLEVER / Mendelian-inheritance-AtTEntive CLique-Enumerating Variant finder
Detects genotype insertions and deletions from paired-end reads. CTK is a suite of tools for next-generation sequencing (NGS) data analysis and is based on an internal segment size approach to discover indel variation from paired-end read data. It contains also, among others, a long-indel-aware read mapper (LASER), a BAM converter to a list of alignment pairs with prior probabilities and a split feature by chromosome.
Identifies structural variations (SVs) with soft-clipping information. ClipCrop remaps soft-clipped sequences and infers which type of SV events exists form the mapping pattern. This software can identify SVs with higher discovering rate especially in short size duplications and insertions. It doesn’t need a large depth of coverage or long read lengths and can deal with most of current next-generation sequencing (NGS) data.
Detects long deletions of a genome as well as the RNA splicings using long Illumina reads. Clippers is a deletion identification program using periodic spaced seed. The software is a sister tool of PerM, a short reads aligner. One goal of its development is to find the non-canonical splicing/deletion, with long read with dense single nucleotide polymorphism (SNP) referenced or high sequencing error-rate.
CREST / Clipping REveals STructure
An algorithm using NGS reads with partial alignments to a reference genome to directly map structural variations at the nucleotide level. Application of CREST to whole-genome sequencing data from five pediatric T-lineage acute lymphoblastic leukemias (T-ALLs) and a human melanoma cell line, COLO-829, identified 160 somatic structural variations. Experimental validation exceeded 80%, demonstrating that CREST had a high predictive accuracy.
Retrieves balanced and unbalanced forms of structural variation, such as deletions, tandem duplications, inversions and translocations. DELLY is based on a combination of short-range and long-range paired-end mapping and split-read analysis. It is useful for massively parallel sequencing (MPS) data from various sources, including deep whole-genome sequencing data and low-pass mate-pair sequencing data with longer inserts.
Allows joint prediction of rearrangement breakpoints from single or multiple tumor samples. deStruct is a software which identifies breakpoints, and assigns read alignments to those breakpoints. The software method uses a series of realignment and clustering steps to progressively refine breakpoint prediction quality and accuracy.
Identifies tandem duplications in sequencing reads.
An efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions.
Offers a method for the detection of structural variants (SVs). GASVPro proposes a probabilistic model, able to consider inversions and reciprocal translocations, which is based on a merging of paired-read and read depth signals. It furnishes a method able to handle reads with multiple possible alignments. This program can report: (i) uncertainty in predicted breakpoint and if a generic breakend can be classified as an homozygous or an heterozygous variant.
Assists users in handling structural variants (SVs) breakpoints. Hydra-sv confronts various discordant mappings with the aim of enabling the detection, assembly, and interpretation of the mechanics related to these breakpoints. This software can be employed on a genome used for testing to highlight novel DNA junctions or, theoretically, to detect genetic events triggering a breakpoint.
Detects and visualizes structural variation from paired-end mapping data. Under this scheme, abnormally mapped read pairs are clustered based on the location of a gap signature. Several important features, including local depth of coverage, mapping quality and associated tandem repeat, are used to evaluate the quality of predicted structural variation. Compared with other approaches, it can detect many more large insertions and complex variants with lower false discovery rate. Moreover, inGAP-sv, written in Java programming language, provides a user-friendly interface and can be performed in multiple operating systems.
A computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. The package is composed of three modules, PEMer workflow, SV-Simulation and BreakDB. PEMer workflow is a sensitive software for detecting SVs from paired-end sequence reads. SV-Simulation randomly introduces SVs into a given genome and generates simulated paired-end reads from the ‘novel’ genome. Subsequent analysis with PEMer workflow on the simulated reads can facilitate parameterize PEMer workflow. BreakDB is a web accessible database developed to store, annotate and dsplay SV breakpoint events identified by PEMer and from other sources.
Detects breakpoints of large deletions and medium sized insertions from paired-end short reads. Pindel is a program that uses pattern growth algorithm to identify the break points of large deletions (1 bp–10 kb) and medium sized insertions (1–20 bp) from 36 bp paired-end short reads. The software can be useful for addressing the structural variations between individuals from next-gen high throughput sequencing.
Finds structural variant breakpoints in Illumina paired-end next-generation sequencing (NGS) data. SoftSearch is a breakpoint detection tool for paired-end NGS instruments that uses multiple sequence features to infer breaks point, for characterizing location and type of structural variants. The software can identify large Insertions, large deletions, inversions, tandem duplications, novel sequence insertion locations, and chromosomal translocations.
Identifies small insertions and deletions with size less than 50bp (INDELs) as well as large deletions within the coding regions from exome sequencing data. SPLITREAD is a general combinatorial algorithm that detects structural variants (SVs) and indels based on the computational prediction of breakpoints. The software can be applied to a large number of exomes in a computationally efficient manner to generate a database of bona fide exonic indels and SVs.
Identifies genomic structural variations from paired-end and mate-pair sequencing data. SVDetect isolates and predicts intra- and inter-chromosomal rearrangements from paired-end/mate-pair sequencing furnished by the high-throughput sequencing technologies. This software proceeds first by collecting all pairs that are suspected to come from the same structural variant (SV). It then employs a sliding-window strategy to detect all groups of pairs sharing similar genomic location.
A pipeline to detect structural variants (SVs) by integrating calls from several existing SV callers, which are then validated and the breakpoints refined using local de novo assembly.
Identifies genomic structural variation via model-based clustering. SVMiner handles separation distances and read pair’s orientation. This software automatically characterizes and collects additional features for various types of structural variation candidates. It exploits these features in a probabilistic model to cluster and organize each candidate variants. SVMiner can also predict the heterozygosity of genomic deletions.
Finds deletions with exact breakpoints from low-coverage next-generation sequencing (NGS) data. SVseq performs two steps: it (1) applies an enhanced split reads mapping approach to identify candidate deletion sites from sequence reads, and (2) uses mapped paired-end reads spanning candidate deletions as supports to filter false positives. It was tested using the 1000 genomes project pilot low-coverage data and pilot high-coverage data.
A tool for discovery of structural variation in one or more individuals simultaneously using high throughput technologies. VariationHunter is now capable of resolving incompatible SV calls through a conflict resolution mechanism that no longer requires post-processing heuristics.
CEQer / Comparative Exome Quantification analyzer
A graphical, event-driven tool for CNA/AI-coupled analysis of exome sequencing reads.
CoNIFER / Copy Number Inference From Exome Reads
Uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes.
A tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data.
A read count based tool that exploits all the reads produced by whole-exome sequencing (WES) experiments to detect copy Number Variants (CNVs) with a genome-wide resolution. EXCAVATOR2 enhances the identification of genomic CNVs (overlapping or non-overlapping exons) from WES data by integrating the analysis of In-targets and Off-targets reads. It extends the RC approach to the whole genome sequence and exploits the shifting level model (SLM) algorithm to segment the two combined profiles. Thereafter, the FastCall algorithm allows to classify each segmented region into five possible states (two-copy deletion, one-copy deletion, normal, one-copy duplication and multiple-copy amplification).
A statistical method to detect CNV and LOH using depth-of-coverage and B-allele frequencies from mapped short sequence reads in exome sequencing data. We apply ExomeCNV to a cancer exome resequencing dataset. As expected, accuracy and resolution are dependent on depth-of-coverage and capture probe design.
A software tool developed at McGill University, is a tool for comprehensive analysis of rare copy number variations in high-throughput exome sequencing data.