Read simulation software tools | Whole-genome sequencing data analysis
In the past few years, high-throughput next-generation sequencing technologies have effectively replaced earlier data types for genome-wide studies measuring gene expression changes and discovering genomic/epigenetic variations, and many tools were developed for analyzing such datasets. Simulated data is indispensable for guiding tool development and evaluating tool performance, and therefore it is essential to develop simulation software that can produce next-generation sequencing reads that captures the most essential characteristics of real data.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Facilitates the design, optimization, and tracking of barcoded oligonucleotides. XSTK is useful for projects that require highly multiplexed polymerase chain reaction (PCR) and DNA sequencing. It builds a list of all possible DNA sequences of a specified length and then progressively culls sequences that may interfere with primary PCR amplification and/or sequencing steps.
A small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms. Wgsim outputs the simulated polymorphisms, and writes the true read coordinates as well as the number of polymorphisms and sequencing errors in read names.
An illumina paired-end and mate-pair short read simulator. This project attempts to model as many of the quirks that exist in Illumina data as possible. Some of these quirks include the potential for chimeric reads, and non-biotinylated fragment pull down in mate-pair libraries.
The commercial launch of 454 pyrosequencing in 2005 was a milestone in genome sequencing in terms of performance and cost. Flowsim is a simulator that generates realistic pyrosequencing data files of arbitrary size from a given set of input DNA sequences.
Captures the technology-specific features of Oxford Nanopore technologies (ONT) data and allows adjustments upon improvement of nanopore sequencing technology. NanoSim is a read simulator which provides a comprehensive alignment-based analysis, and generates a set of read profiles serving as the input to the next step and the simulation stage. The simulation tool uses the model built in the previous step to produce in silico reads for a given reference genome.
A targeted re-sequencing simulator that generates synthetic exome sequencing reads from a given sample genome. Wessim emulates conventional exome capture technologies, including Agilent's SureSelect and NimbleGen's SeqCap, to generate DNA fragments from genomic target regions. The target regions can be either specified by genomic coordinates or inferred from in silico probe hybridization. Coupled with existing next-generation sequencing simulators, Wessim generates a realistic artificial exome sequencing data, which is essential for developing and evaluating exome-targeted variant callers.