1 - 20 of 20 results

V-Phaser 2

Serves for inferring intra-host diversity within viral populations. V-Phaser 2 adds three major methodologies to the state of the art: (1) a technique to utilize paired end read data for calling phased variants; (2) a strategy to represent and infer length polymorphisms; and (3) an in-line filter for erroneous calls arising from systematic sequencing artifacts. It utilizes paired reads in phasing, extending the distance between phased sites from a read length to a fragment length.

ShoRAH / Short Reads Assembly into Haplotypes

A computational method for quantifying genetic diversity in a mixed sample and for identifying the individual clones in the population, while accounting for sequencing errors. This approach provides the user also with an estimate of the quality of the reconstruction. Further, ShoRAH can reconstruct the global haplotypes and estimate their frequencies. ShoRAH was run on simulated data and on real data obtained in wet lab experiments to assess its reliability.


Reconstructs haplotypes using a penalized regression approach. RegressHaplo is an algorithm developed to (i) split the genomic region of interest into subregions, (ii) construct local haplotypes for each subregion, with one local haplotype for each possible path through the subregion’s read graph, (iii) build a set of global haplotypes by considering all possible combinations of local haplotypes, and (iv) apply a penalized regression to determine the global haplotypes that will form the haplotype reconstruction.

EVORhA / EVOlutionary Reconstruction of hAplotypes

A haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. EVORhA by enabling bacterial haplotype reconstruction opens a whole new area of applications for bacterial population sequencing (or metagenome sequencing).

SAVAGE / Strain Aware VirAl GEnome assembly

Allows for reference-free assembly of viral quasispecies from sequencing data sets of truly deep coverage. SAVAGE makes relevant contributions for (i) the construction of overlap graphs from deep coverage read data and (ii) viral quasispecies assembly using the overlap graph assembly paradigm. SAVAGE makes use of either FM-index based data structures for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. SAVAGE performs very favorably in comparison with state-of-the-art reference genome guided tools.


Employs a maximum-likelihood framework to infer individual sequences in a mixture from high-throughput sequencing data. aBayesQR is an algorithm performs reconstruction of a quasispecies from next-generation sequencing (NGS) data in two stages. In the first stage, conflict-free short reads are hierarchically merged and assembled into longer sequences (contigs) which we refer to as super-reads. In the second stage, likelihoods of the probable quasispecies are computed using the assembled super-reads (rather than using the original set of short reads), and the most likely set of viral strains is selected.

ViQuaS / Viral Quasispecies Spectrum Reconstruction using short NGS reads

Provides a method for spectrum reconstruction. ViQuaS is an analysis pipeline for viral Quasispecies Spectrum Reconstruction (QSR) using short next-generation sequencing (NGS) reads. This method integrates a reference-assisted de novo assembly algorithm constituted by the three consecutive steps: (i) de novo assembly, (ii) Mutation calling and (iii) Chimeric error correction. A version of an existing global strain reconstruction algorithm is also used.


A cost-effective and reliable protocol for sequencing of viral samples, that combines NGS using barcoding and combinatorial pooling and a computational framework including algorithms for optimal virus-specific pools design and deconvolution of individual samples from sequenced pools. Evaluation of the framework on experimental and simulated data for hepatitis C virus showed that it substantially reduces the sequencing costs and allows deconvolution of viral populations with a high accuracy.

ANDES / ANalyses of DEep Sequencing

A software library and a suite of applications for the statistical analyses of deep sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.