Identifies repetitive sequences and reconstructs separated sections to provide full-length repeats and, for long-terminal repeat (LTR) retrotransposons, calculates age since insertion based on LTR divergence. TEnest is a repeat identification and display tool made specifically for highly repetitive genomes. It provides a chronological insertion display to give an accurate visual representation of transposable element (TE) integration history showing timeline, location, and families of each TE identified, thus creating a framework from which evolutionary comparisons can be made among various regions of the genome.
A collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads.
A de novo repeat assembly method that avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets.
Assembles a set of transposable elements directly from the reads. Tedna uses Illumina paired-end reads, the most widely used sequencing technology for de novo assembly, and forms full-length transposable elements. Tedna can be used on standard computers with limited RAM resources, although it may also use large memory for better results. Most of the code is parallelized and thus ready for large infrastructures.
A command line application to annotate transposable elements from paired-end whole genome shotgun data. There are many tools to estimate the mathematical nature of repeats from short sequence reads. There are also a number of tools for analyzing repeats directly from a genome assembly. This tool allows you to infer the abundance of repeat types in the genome without a reference genome sequence. The output files make it easy to quickly summarize genomic abundance by transposable element class, superfamily, family, or any other level of the repeat taxonomy.
Assembles repeat sequences directly from raw shotgun sequencing data. REPdenovo can construct various types of repeats that are highly repetitive and have low sequence divergence within copies. We show that REPdenovo is substantially better than existing methods both in terms of the number and the completeness of the repeat sequences that it recovers. The key advantage of REPdenovo is that it can reconstruct long repeats from sequence reads. REPdenovo is a powerful computational tool for annotating genomes and for addressing questions regarding the evolution of repeat families.
Provides a fully automated rationally derived method for complete analysis of significant Common Insertion Sites (CISs) using next generation sequencing. TAPDANCE furnishes a pipeline that performs raw sequence processing and trimming, mapping to the reference genome, CIS identification, and nearest-gene annotation. The software can be applied to many sequence sampling depth and allows to manage large dataset.
Facilitates transposable elements consensus sequences curation. ReannTE provides scriptsthat simplifies merging two consensus libraries and parses the masking outputs to evaluate overlaps. It was used to eliminate non-TE repeats (simple repeats or gene 120 families).
Assists in the identification of protein or nucleic acid sequence homology to proteins encoded by diverse families of transposable elements. TransposonPSI can be used to detect potential transposon open reading frames (ORFs) within a protein set, or regions of transposon homology within a larger genome sequence. This method can be applied for identifying mobile elements across eukaryotes including protozoa, plants, fungi, and animals.
A bioinformatic pipeline dedicated to find, annotate and quantify transposable elements as well as other repeated DNA, in NGS low coverage genomic datasets. dnaPipeTE produces a global estimation of the main proportion of repeat classes in the genome. It estimates the relative abundance of each assembled repeat and draws the TE age distribution for the most recent elements.