Read simulation software tools | CLIP sequencing data analysis
Simulated data can provide a useful approximation to real dataset in cases where experimental determination of the ground truth on a large scale is infeasible. However, for the simulation to be useful, it is critical that it has the same characteristics as real datasets.
A toolkit for processing and aligning short and error-prone sequencing reads. PARA-suite is implemented in Java using HTSjdk, a Java API for high-throughput sequencing data formats. The PARA-suite allows the user to estimate a sequencing run-specific error profile, combine the results of multiple reference database alignments, cluster an aligned sequencing read dataset (‘PAR-CLIP read simulator and hierarchical clustering’), run the PAR-CLIP read simulator, benchmark an alignment of simulated PAR-CLIP sequencing reads and run a full processing pipeline for error-prone short read alignments.
Allows generation of datasets with known ground truth that exhibits several characteristics of real data. Cseq-Simulator contains two options: (1) users can predict the binding sites using a position weight matrix (PWM), (2) users can provide list of binding site, which allows others to provide binding sites that are experimentally determined or derived using other models.