Variant simulation software tools | Whole-genome sequencing data analysis
Various methods dealing with the comparison, annotation, visualization and especially the detection of structural variations (SVs) were published. The simulation of SVs is a powerful, quick and inexpensive approach to assess their performance and correctness.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Adds mutations to .bam files. BAMSurgeon allows creation of synthetic mutations for generating somatic structural variants (SVs). The software can create SVs through read selection, local sequence assembly, manipulation of assembled contigs, and simulation of sequence coverage over the altered contigs. It was used to generate synthetic tumour-normal pairs, with the same set of target mutations, that differ by the division of reads into tumour and normal sequence sets, aligner or cell line.
Simulates Illumina reads using empirical profiles. pIRS is a simulator developed to reproduce similar to those generated from the Illumina platform. This method be helpful for developing next-generation sequencing (NGS) software such as de novo assembly, single-nucleotide polymorphism (SNP) calling and structural variation detection. This application can also be useful for applications that need heterozygous data.
Simulates deletions, insertions, inversion, tandem-duplications and translocations of various sizes in any genome available as FASTA-file or BSgenome data package. SV breakpoints can be placed uniformly accross the whole genome, with a bias towards repeat regions and regions of high homology (for hg19) or at user-supplied coordinates.
A framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators.
A benchmarking methodology for evaluating germline variant calling algorithms. Authors generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on these benchmarking data.
Enables the creation of tumour samples with different ploidy, purity and polyclonality features. tHapMix easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. It is a versatile and scalable workflow for simulation of diverse tumour characteristics, such as polyclonality purity and aneuploidy.
Constructs short sequence read sets for thousands of spike-in variants that cover different types, sizes, haplotypes and allelic fractions. SVEngine generates whole genome or targeted sequencing data simulating germ-line, somatic and complex clonal structured genomes. It is able to produce read pairs in accordance with the library specification. This tool can realize extraction of breakpoints for individual variants.
Introduces user-defined haplotype-phased allele-specific copy number events into an existing BAM file with a focus on targeted and exome sequencing experiments. Bamgineer can be used to add or delete haplotype-phased and allele-specific copy number events to existing alignments of targeted next-generation sequencing (NGS) data. It aids in development and benchmarking of copy number variation (CNV) calling and other sequence data analysis tools and pipelines.
Simulates and generates short sequence reads with different types of biological variants. SInC uses a realistic error model based on base quality values of reads. It consists of three independent modules (one each for single nucleotide variant (SNV), insertion deletion events (indels) and copy number variations (CNVs)) that can either be executed independently in a mutually exclusive manner or in any possible combination.
Simulates structural variation. Pysim-sv is a package for simulating high-throughput sequencing (HTS) data to evaluate performance of structural variation (SV) detection algorithms. Pysim-sv can introduce a wide spectrum of germline and somatic genomic variations, making simulated genomes more similar to real genomes. The package contains functionalities to simulate aneuploidy as well as heterogeneous tumor data, which is very useful in assessing performance of algorithms in tumor studies.
A tool for simulating somatic CNVs and structure variations SVs. Other than multiple types of SV and CNV events, the tool is capable of simulating important features related to tumor samples including aneuploidy, heterogeneity and purity. SCNVSim generates the genomes of a cancer cell population with detailed information of copy number status, loss of heterozygosity (LOH), and event break points, which is essential for developing and evaluating somatic CNV and SV detection methods in cancer genomics studies.
Simulates large populations of gene sequences using a forward-time simulation approach. srv is capable of simulating several multi-locus fitness schemes with arbitrary diploid single-locus selection models with random or locus-specific fitness effects. It uses realistic demographic and natural selection models estimated from empirical sequence data. The tool can be easily modified to reveal details of the evolutionary process or use alternative demographic or genetic models.
Provides a simulation tool of single nucleotide polymorphism (SNP) clusters for next-generation sequencing (NGS) data. ShRangeSim can generate surrogate NGS data for the evaluation of pipelines dedicated to the identification of genomic variations and can also estimate the distribution of sizes of short range SNP clusters (SRSCs). It provides also several scripts for simple evaluation of mappings results.
Constructs an initial DNA sequence at the root of the tree and simulates evolution across the tree using a variety of common models of DNA evolution. MySSP is a program for the simulation of DNA sequence evolution across a phylogenetic tree. It is designed for large-scale studies, including simulation of multiple replicates and outputs sequences into NEXUS, MEGA, or FASTA formats. MySSP has a fairly simple graphical user interface (GUI) for basic use, but also has a specialized batch script interpreter to allow for more complicated or large-scale simulations.
Mimics common DNA variants and constructs sequencing reads for mixture genomes. IntSIM can feign both germline and somatic variants in the same sequence. It employs tumor purity to build reads corresponding to heterogeneous genomes and also generate tumor-normal matched samples. This tool can make simulation of correlations among single nucleotide polymorphisms (SNPs), among copy number variations (CNVs)/ copy number alterations (CNAs).
Allows users to simulate copy number variants (CNVs) for whole exome sequencing (WES) or whole genome sequencing (WGS) data. SimulateCNVs is a program that offers a single command/step to generate rearranged genomes, short reads and bam files for multiple test samples, with the control being the original genome.
Allows users to sample rows from large text or binary files. sampleSNPs is composed of three parts: (1) a library named libsampFiles.a permitting users to construct same applications than the sampling algorithms used in the two programs mentioned below; (2) a program named sampleLD calculating linkage disequilibrium (LD); and (3) another program named sampleSNPs creating and saving ordered random samples of single nucleotide polymorphisms (SNPs) from a variety of formats.
A set of tools used to create diploid fasta files with containing SNPs, indels, duplications, deletions, and translocations. These FASTA files can then be used in conjunction with next-generation sequencing simulators to artificially create sequencing experiments. The utility of these tools are to assess the performance and reliability of data analysis in next-generation sequencing pipelines.
Enables the design of regulatory sequences. SequenceShaper allows users to generate and delete transcription factor (TF) binding sites. The software (i) analyzes the sequence for existing transcription factor binding sites using MatInspector and (ii) can generate a new site and delete an existing site. It can discriminate side effects like unintended insertion or deletion of other transcription factor binding sites.