Ancient DNA data analysis software tools | Whole-genome sequencing
Research involving ancient DNA (aDNA) has experienced a true technological revolution in recent years through advances in the recovery of aDNA and, particularly, through applications of high-throughput sequencing.
Models the genotyping and SNP calling from the raw read sequences in a fully probabilistic framework. There are many advantages in using a probabilistic model: The sampling and sequencing process is modeled explicitly which makes the approach flexible, all results get an intuitive confidence measure directly from the method, it can utilize all available information, and it is easily extended to take other sources of error or prior knowledge into account.
The basic idea of this program is to align DNA sequencing fragments (shotgun or targeted resequencing) to a reference, then call a consensus. Then the consensus is used as new reference and the process is repeated until convergence. Since it was originally designed to be used on ancient DNA, it supports a position specific substitution matrix, which improves both alignment and consensus calling on chemically damaged aDNA. MIA has been used to assemble a number of Neandertal and early modern human mitochondria.
Computes nucleotide misincorporation and fragmentation patterns using next-generation sequencing reads mapped against a reference genome. mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage.
An iterative approach to jointly estimate present-day human contamination in ancient human DNA datasets and reconstruct the endogenous mitochondrial genome. By using sequence deamination patterns and fragment length distributions, schmutzi accurately reconstructs the endogenous mitochondrial genome sequence even when contamination exceeds 50 %. Given sufficient coverage, schmutzi also produces reliable estimates of contamination across a range of contamination rates.
Permits next-generation sequencing (NGS) analysis to reconstruct ancient genomes. EAGER is able to perform several raw read pre-processing steps, including the initial analysis of raw sequencing reads using FastQC to assess the basic quality of the generated NGS data. It can be used to generate summary reports with the most important statistics including mapping and genotyping of all processed samples.
Removes the adaptors and reconstructs the original DNA sequences. leeHom is based on a Bayesian maximum a posteriori probability approach and process reconstruction for both simulated and ancient DNA data sets. It proceeds by considering the processes of adaptor trimming and merging into a single probabilistic model. This software can handle common sequencing problems like missing cycles and it tends to avoid false positives.
A flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. PALEOMIX is compatible with a full range of sequence data and performs a series of user-defined analyses, including read trimming, collapsing of overlapping mate-pairs, read mapping, PCR duplicate removal, SNP calling, and metagenomic profiling.
A framework for evaluating the likelihood of a sequence originating from a model with postmortem degradation-summarized in a postmortem degradation score-which allows the identification of DNA fragments that are unlikely to originate from present day sources. PMDtools opens up the potential for genomic analysis of contaminated fossil material.
A set of programs aimed at simulating ancient DNA fragments. Gargamel can simulate most common features of a DNA sequences, including post-mortem DNA damage and base misincorporations. It simulates base compositional bias due to the molecular tools used in library preparation, sequencing bias against GC-rich fragments and errors introduced by the sequencing platform. Gargammel provides researchers with the opportunity to perform various inquiries to evaluate the robustness of various analyses to a DNA properties.
Measures genotypes frequencies and error rates along sequences from ancient-DNA data. snpAD is based on an expectation-maximization (EM) algorithm that estimates by maximum likelihood the frequency of genotypes given equal error rates for all base exchange. This software can call genotype by jointly estimating all necessary parameters from the data and can process the posterior probabilities for each genotype.
Permits users to cluster and visualize samples based on DNA mismatch patterns. aRchaic is based on a “grade-of-membership” method that generalizes the concept of clustering for allowing samples to have membership in multiple clusters. Even if this application does not provide explicit estimates of contamination levels, it can be useful for flagging potentially contaminated samples and for detecting batch effects.
Maps reads to genomes. MaxSSmap computes maximum scoring subsequence score between the read and disjoint fragments of the genome in parallel. It selects the highest scoring fragment for exacting alignment. This tool was tested on real data by mapping ancient horse DNA reads to modern genomes and unmapped paired reads from NA12878 in 1000 genomes. It finds a local region of the genome and then aligns the read with Needleman-Wunsch to work.
Identifies epigenetic signatures in archaeological material from high-throughput DNA sequencing data. epiPALEOMIX leverages on natural degradation processes that affect DNA after death and, thus, does not require prior treatment of ancient DNA extracts with gold-standard epigenetic methods, such as bisulfite or chip-seq. It can reveal genome-wide patterns of CpG methylation, and can generate nucleosome maps and phasogram analyses. epiPALEOMIX can accommodate any type of molecular tools used to prepare ancient DNA (aDNA), including USER-treatment of DNA extracts and amplification of DNA libraries with uracile-intolerant DNA polymerase, such as the Phusion DNA polymerase.
A probabilistic short read aligner based on the use of position specific scoring matrices (PSSM). Like many of the existing aligners it is fast and sensitive. Unlike most other aligners, however, it also adaptable in the sense that one can direct the alignment based on known biases within the data set. BWA-PSSM is coded as a modification of the original BWA alignment program and shares the genome index structure as well as many of the command line options.
A method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing. Odintifier represents the first integration of phasing algorithms into a reference-based organellar genome sequence assembly method, that furthermore allows for the simultaneous identification and reconstruction of organellar-derived inserted sequences.
Analyzes ancient samples that properly account for post-mortem damage (PMD). ATLAS works directly from raw BAM files and contains all necessary methods to infer patterns of PMD, recalibrate base quality scores and genotype ancient DNA, along with many other tools. The software enables the building of complete and customized pipelines for the analysis of ancient and low-depth samples.
Provides a mapper application. ANFO is an application specifically designed to semi-globally align sequencing reads. It employs an index of the reference genome to find short exact matches, and then uses these as seeds for an exact alignment. This method is composed of an alignment algorithm that incorporates knowledge of a DNA damage patterns, which is reflected in alignment scores.