Genome profiling software tools | De novo genome sequencing data analysis
High throughput sequencing enables the sequencing of novel genomes on a daily basis. Nevertheless, even their most basic characteristics, such as their size or heterozygosity rate, may be initially unknown, making it difficult to select appropriate analysis methods e.g. read mapper, de novo assembler, or SNP caller. Determining these characteristics in advance can reveal if an analysis is not capturing the full complexity of the genome, such as underreporting the number of variants or failure to assemble a significant fraction of the genome.
Assembles large genomes from high coverage short read data. SGA implements a set of assembly algorithms based on the FM-index. It corrects base calling errors in the reads, assembles contigs from the corrected reads, uses paired end and/or mate pair data to build scaffolds from the contigs. This tool returns a visual report that allows to display the properties of the genome and data quality.
Estimates the overall genome characteristics (total and haploid genome length, percentage of repetitive content, and heterozygosity rate) as well as overall read characteristics (read coverage, read duplication,and error rate) from raw short read sequencing data. Using the web application, users can upload their k-mer profile and seconds later GenomeScope will report the genomic properties and generate high quality figures and tables.
Determines species identity, hybrid status and chromosomal copy-number variants (CCNVs). sppIDer provides visual and insight into ancestry genome-wide and aims to discover and characterize interspecies hybrids. This pipeline has been tested through simulations and is able to identify hybrids of closely or distantly related species that can be of recent or ancient origin. It also reports the percentage of reads that maps to each reference genome.
Offers a solution to explore and display ploidy levels in a sequenced genome. ploidyNGS exploits short read data to proceed. It is composed of four steps: (1) storing the number of reads supporting different nucleotides for each position; (2) traversing the data structure, ignoring positions where a single nucleotide was observed and where the most frequent nucleotide had a frequency larger; (3) ordering putative allele percentages at each position; and (4) building a histogram.
Aims to distinguish the distribution of base frequencies at variable sites for diploids, triploids and tetraploids directly from read mappings to a reference genome. nQuire is a statistical approach that models base frequencies as a Gaussian Mixture Model (GMM), and uses maximum likelihood to assess empirical data under the assumptions. This method could be useful to assess intraspecific variation in ploidy from both historic and modern samples, as well as in experimental evolution experiments.
Investigates diploid-triploid mixoploidy into whole genome sequencing (WGS) trio datasets. mixoviz is a standalone application based on the highlighting of variant calls localized around unexpected B-allele frequencies, which can be applied as part of WGS test in order to increase the rapidity of the analysis. This software can be applied to multiple cell type mixtures type and can also being exploited to derive an assessment of mixture degree.
A probabilistic method that estimates the ploidy of any given contig/scaffold based on its allele proportions. ConPADE performs well as long as enough sequencing coverage is available, or the true contig ploidy is low. The method can be used for whole genome shotgun (WGS) sequencing data. It may also be used for related applications, such as the identification of duplicated genes in fragmented assemblies, although refinements are needed.