1 - 50 of 90 results


star_border star_border star_border star_border star_border
star star star star star
Performs a variety of trimming tasks for Illumina paired-end and single ended data. Trimmomatic is a flexible, pair-aware preprocessing tool, optimized for Illumina next-generation sequencing (NGS) data. The software includes several processing steps for read trimming and filtering. It uses a pipeline-based architecture, allowing individual ‘steps’ (adapter removal, quality filtering, etc.) to be applied to each read/read pair, in the order specified by user.


star_border star_border star_border star_border star_border
star star star star star
Implements a dynamic programming algorithm dedicated to the task of adapter trimming. Skewer is specially designed for processing illumina paired-end sequences. Experiments on simulated data, real data of small RNA sequencing, paired-end RNA sequencing, and Nextera LMP sequencing showed that Skewer outperforms all other similar tools that have the same utility. Further, Skewer is considerably faster than other tools that have comparative accuracies; namely, one times faster for single-end sequencing, more than 12 times faster for paired-end sequencing, and 49% faster for LMP sequencing.


star_border star_border star_border star_border star_border
star star star star star
Detects and removes multiple alien sequences in both ends of sequence reads. Based on the decomposition of specified alien nucleotide sequences into k-mers, AlienTrimmer is able to determine whether such alien k-mers are occurring in one or in both read ends by using a simple polynomial algorithm. Therefore, AlienTrimmer can process typical HTS single- or paired-end files with millions of reads in several minutes with very low computer resources.

CLC Genomics Workbench

star_border star_border star_border star_border star_border
star star star star star
forum (1)
Allows to analyze, compare, and visualize next generation sequencing (NGS) data. CLC Genomics Workbench offers a complete and customizable solution for genomics, transcriptomics, epigenomics, and metagenomics. The software enables to generate custom workflows, which can combine quality control steps, adapter trimming, read mapping, variant detection, and multiple filtering and annotation steps into a pipeline.


star_border star_border star_border star_border star_border
star star star star star
Finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads. Cutadapt helps with these trimming tasks by finding the adapter or primer sequences in an error-tolerant way. It can also modify and filter reads in various ways. Adapter sequences can contain IUPAC wildcard characters. Also, paired-end reads and even colorspace data is supported. If you want, you can also just demultiplex your input data, without removing adapter sequences at all.


Examines epigenomic and transcriptomic next generation sequencing (NGS) data. Octopus-toolkit can be used for antibody- or enzyme-mediated experiments and studies for the quantification of gene expression. It can accelerate the data mining of public epigenomic and transcriptomic NGS data for basic biomedical research. This tool provides a private and a public mode: one to process the user’s own data, and the other to analyze public NGS data by retrieving raw files from the GEO database.


A package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead extends Bioconductor with tools useful in the initial stages of short-read DNA sequence analysis. Main functionalities include data input, quality assessment, data transformation and access to downstream analysis opportunities. It is an important gateway to use of Bioconductor for processing high-throughput DNA sequence data. ShortRead data structures allow convenient manipulation of data, such as filtering reads based on sequence characteristics.


Facilitates analysis of microarrays and miRNA/RNA-seq data on laptops. oneChannelGUI can be used for quality control, normalization, filtering, statistical validation and data mining for single channel microarrays. It offers a comprehensive microarray analysis for Affymetrix 3′ (IVT) expression arrays as well as for the new generation of whole transcript arrays: human/mouse/rat exon 1.0 ST and human gene 1.0 ST arrays. oneChannelGUI inherits the core affylmGUI functionalities and permits a wider range of analysis allowing biologists to choose among different criteria and algorithms in order to analyze their data. It is a didactical tool since it could be used to introduce young life scientists to the use and interpretation of microarray data. For this purpose various data sets and exercises are available at the oneChannelGUI web site.

NGS-Trex / NGS TRanscriptome profile EXplorer

Allows user to upload raw sequences and obtain an accurate characterization of the transcriptome profile. NGS-Trex can assess differential expression at both gene and transcript level. It compares the expression profile of different samples. All comparisons are performed using a custom database which is mainly populated with several sources obtained from the NCBI. The tool allows user to discard ambiguously assigned reads or to assign those reads to all competing genes in the case of ambiguities.


A user-friendly software package designed to generate detailed statistics and at-a-glance graphics of sequence data quality both quickly and in an automated fashion. SolexaQA contains associated software to trim sequences dynamically using the quality scores of bases within individual reads. It produces standardized outputs within minutes, thus facilitating ready comparison between flow cell lanes and machine runs, as well as providing immediate diagnostic information to guide the manipulation of sequence data for downstream analyses.

TRAPR / Total RNA-Seq Analysis Package for R

Facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization. It allows users to build customized analysis pipelines. The tool can be easily applied to other technologies like Serial Analysis of Gene Expression and microarray thanks to its implementation in R.

ST Pipeline

Permits to process and analyze the raw files generated with the Spatial Transcriptomics (ST) method. ST Pipeline enables demultiplexing of spatially-resolved RNA-seq data and robust quality filtering and identification of unique molecules. It is highly customizable with numerous parameter settings. The tool is more robust, efficient and scales better to arrays with higher density. It filters data, aligns it to a genome, annotates it to a reference, demultiplexes by array coordinates and then aggregates by counts that are not duplicates using the Unique Molecular Identifiers.


Allows extraction and labelling of the sequences to be mapped in downstream pipelines, from next-generation sequencing (NGS) data. TagDust performs all steps required to go from raw to mappable sequences and therefore simplifies processing pipelines. The software, using hidden Markov models (HMMs), can work on datasets with a broad range of sequencing error rates. It enables users to define several read architectures and to use the same pipeline for the preprocessing of diverse data types.

TRAPLINE / Transparent Reproducible and Automated PipeLINE

Serves for RNAseq data processing, evaluation and prediction. TRAPLINE guides researchers through the NGS data analysis process in a transparent and automated state-of-the-art pipeline. It can detect protein-protein interactions (PPIs), miRNA targets and alternatively splicing variants or promoter enriched sites. This tool includes different modules for several functions: (1) it scans the list of differentially expressed genes; (2) it includes modules for miRNA target prediction; and (3) a module is implemented to identify verified interactions between proteins of significantly upregulated and downregulated mRNAs.


A method for content dependent read trimming for next generation sequencing data using quality scores of each individual base. The main focus of the method is to remove sequencing errors from reads so that sequencing reads can be standardized. Another aspect of the method is to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequence data of arbitrary length and it is independent from sequencing coverage and user interaction. ConDeTri is able to trim and remove reads with low quality scores to save computational time and memory usage during de novo assemblies. Low coverage or large genome sequencing projects will especially gain from trimming reads. The method can easily be incorporated into preprocessing and analysis pipelines for Illumina data.


A fast and lightweight software to trim adapters and low quality regions in reads from ultra high-throughput next-generation sequencing machines. It also can reliably identify barcodes and assign the reads to the original samples. Based on a modified Myers's bit-vector dynamic programming algorithm, Btrim can handle indels in adapters and barcodes. It removes low quality regions and trims off adapters at both or either end of the reads. A typical trimming of 30M reads with two sets of adapter pairs can be done in about a minute with a small memory footprint. Btrim is a versatile stand-alone tool that can be used as the first step in virtually all next-generation sequence analysis pipelines.

ERNE / Extended Randomized Numerical alignEr

A short string alignment package whose goal is to provide an all-inclusive set of tools to handle short (NGS-like) reads. ERNE 2 (a.k.a. bw-erne) uses the Burrows Wheeler Transformation (BWT) to reduce memory requirements preserving its speed and accuray. ERNE 2 comprises ERNE-MAP (core alignment tool/algorithm), ERNE-BS5 (bisulfite treated reads aligner), ERNE-FILTER (quality trimming and contamination filtering), and parallel version of the aligners (ERNE-PMAP and ERNE-PBS5). The alignment core supports indels and one long gap.


A comprehensive tool for analyzing next-generation sequencing data. AdapterRemoval is able to pre-process both single and paired-end data. The program locates and removes adapter residues from the reads, it is able to combine paired reads if they overlap, and it can optionally trim low-quality nucleotides. Furthermore, it can look for adapter sequence in both the 5' and 3' ends of the reads. This is a flexible tool that can be tuned to accommodate different experimental settings and sequencing platforms producing FASTQ files. AdapterRemoval is shown to be good at trimming adapters from both single-end and paired-end data.

GBS-SNP-CROP / GBS SNP Calling Reference Optional Pipeline

Discovers SNP and characterizes plant germplasm. GBS-SNP-CROP adopts a clustering strategy to build a population-tailored “Mock Reference” from the same GBS data used for downstream SNP calling and genotyping. It may be used to augment the results of alternative analyses, whether or not a reference is available. The tool may complement other reference-based pipelines by extracting more information per sequencing dollar spent. GBS-SNPCROP may be useful even in this case, able to detect large numbers of additional high-quality SNPs missed by the tag-based and read length-restricted approach of TASSEL-GBS.


Allows trimming of next-generation sequencing (NGS) reads. Atropos is a read-trimming tool that has features for Methyl-Seq-specific trimming options, automated adapter detection, estimation of sequencing error, computation of quality-control metrics before and after trimming, and support for data generated by many sequencing methods. The software includes a command that provides an estimate of the error rate in each input file and options to collect quality control (QC) metrics before and/or after trimming. It was evaluated using both simulated and real-world data.

DNApi / De Novo Adapter prediction iterative algorithm

Predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. DNApi has been tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (around 2.85 seconds per library) and efficient memory usage (around 43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were “ready-to-map” small RNA sequence. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected.


Processes raw reads to count tables for RNA-seq data using Unique Molecular Identifiers (UMIs). zUMIs is a pipeline applicable for most experimental designs of RNA-seq data, such as single-nuclei sequencing techniques. This method allows for down sampling of reads before summarizing UMIs per feature, which is recommended for cases of highly different read numbers per sample. zUMIs is flexible with respect to the length and sequences of the barcodes (BCs) and UMIs, making it compatible with a large number of protocols.


Identifies and removes the vector from raw DNA sequence data without prior knowledge of the vector sequence. Figaro is able to determine which DNA words are most likely associated with vector sequence by statistically modeling short oligonucleotide frequencies within a set of reads. This algorithm can be used to correctly identify the vector clipping points for sequences obtained from public databases. The code was implemented as a single streamlined module which can be easily integrated into a high-throughput computational pipeline. The code is distributed through the AMOS package.


A free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations, and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB, and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides, and a user interface designed to enable both novice and experienced users of RNA-Seq data.


A Genotyping-by-sequencing (GBS) bioinformatics pipeline designed to provide highly accurate genotyping. Fast-GBS is capable of handling data from different sequencing platforms and can detect different kinds of variants (Single Nucleotide Polymorphisms (SNPs), Multiple Nucleotide Polymorphisms (MNPs), and Indels). This pipeline was benchmarked based upon a large-scale, species-wide analysis of soybean, barley and potato. It is easy to use with various species, in different contexts, and provides an analysis platform that can be run with different types of sequencing data and modest computational resources.


star_border star_border star_border star_border star_border
star star star star star
A highly-sensitive adapter trimmer that uses a probabilistic approach to detect the overlap between forward and reverse reads of Illumina sequencing data. SeqPurge can detect very short adapter sequences, even if only one base long. Compared to other adapter trimmers specifically designed for paired-end data, SeqPurge achieves a higher sensitivity. The number of remaining adapter bases after trimming is reduced by up to 90 %, depending on the compared tool. In simulations with different error rates, SeqPurge is also the most error-tolerant adapter trimmer in the comparison.

ADEPT / A Dynamic Error-detection Program with Trimming

Dynamically assesses errors within reads based on position-specific and local quality scores. ADEPT is the first tool that we are aware of that dynamically processes data and relies on within-dataset information to identify errors. The method used to devise the error model for Illumina data can readily be applied for assessing and detecting errors in other technologies. The key to ADEPT is the analysis of quality scores not only of the base being analyzed, but also the scores of its neighboring bases, and how these relate to the entire dataset in a position-specific fashion. ADEPT outperforms other tools with respect to identifying true errors without increasing the total errors called. This is particularly true within the middle of reads, because other tools rely almost exclusively on the quality scores of the base being considered, and because these scores are typically poor at the ends of reads, and their inability to distinguish errors in the higher quality middle portion of reads.