A collection of software tools for characterization of repetitive elements, which is accessible via web interface. A key component of the server is the computational pipeline using a graph-based sequence clustering algorithm to facilitate de novo repeat identification without the need for reference databases of known elements. Because the algorithm uses short sequences randomly sampled from the genome as input, it is ideal for analyzing next-generation sequence reads.
Assists in the identification of protein or nucleic acid sequence homology to proteins encoded by diverse families of transposable elements. TransposonPSI can be used to detect potential transposon open reading frames (ORFs) within a protein set, or regions of transposon homology within a larger genome sequence. This method can be applied for identifying mobile elements across eukaryotes including protozoa, plants, fungi, and animals.
A de novo repeat assembly method that avoids potential biases by using abundant k-mers of NGS WGS reads without requiring a reference genome. For validation, repeat consensuses derived from simulated and real Drosophila melanogaster NGS WGS reads were compared to repeat libraries generated by four established methods. RepARK is orders of magnitude faster than the other methods and generates libraries that are: (i) composed almost entirely of repetitive motifs, (ii) more comprehensive and (iii) almost completely annotated by TEclass. Additionally, we show that the RepARK method is applicable to complex genomes like human and can even serve as a diagnostic tool to identify repetitive sequences contaminating NGS datasets.
Assembles a set of transposable elements directly from the reads. Tedna uses Illumina paired-end reads, the most widely used sequencing technology for de novo assembly, and forms full-length transposable elements. Tedna can be used on standard computers with limited RAM resources, although it may also use large memory for better results. Most of the code is parallelized and thus ready for large infrastructures.
A bioinformatic pipeline dedicated to find, annotate and quantify transposable elements as well as other repeated DNA, in NGS low coverage genomic datasets. dnaPipeTE produces a global estimation of the main proportion of repeat classes in the genome. It estimates the relative abundance of each assembled repeat and draws the TE age distribution for the most recent elements.
Provides a fully automated rationally derived method for complete analysis of significant Common Insertion Sites (CISs) using next generation sequencing. TAPDANCE furnishes a pipeline that performs raw sequence processing and trimming, mapping to the reference genome, CIS identification, and nearest-gene annotation. The software can be applied to many sequence sampling depth and allows to manage large dataset.
Facilitates transposable elements consensus sequences curation. ReannTE provides scriptsthat simplifies merging two consensus libraries and parses the masking outputs to evaluate overlaps. It was used to eliminate non-TE repeats (simple repeats or gene 120 families).