1 - 4 of 4 results

SGA / String Graph Assembler

Assembles large genomes from high coverage short read data. SGA is designed as a modular set of programs, which are used to form an assembly pipeline. SGA implements a set of assembly algorithms based on the FM-index. As the FM-index is a compressed data structure, the algorithms are very memory efficient. The SGA assembly has three distinct phases. The first phase corrects base calling errors in the reads. The second phase assembles contigs from the corrected reads. The third phase uses paired end and/or mate pair data to build scaffolds from the contigs. The output of this software is a PDF report that allows the properties of the genome and data quality to be visually explored. By providing more information to the user at the start of an assembly project, this software will help increase awareness of the factors that make a given assembly easy or difficult, assist in the selection of software and parameters and help to troubleshoot an assembly if it runs into problems.


Aims to distinguish the distribution of base frequencies at variable sites for diploids, triploids and tetraploids directly from read mappings to a reference genome. nQuire is a statistical approach that models base frequencies as a Gaussian Mixture Model (GMM), and uses maximum likelihood to assess empirical data under the assumptions. This method could be useful to assess intraspecific variation in ploidy from both historic and modern samples, as well as in experimental evolution experiments.

conPADE / Contig Ploidy and Allele Dosage Estimation

A probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. The method can be used for whole genome shotgun (WGS) sequencing data. It may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.