Duplicate read removal software tools | High-throughput sequencing data analysis
The presence of duplicates introduced by PCR amplification is a major issue in paired short reads from next-generation sequencing platforms. These duplicates might have a serious impact on research applications, such as scaffolding in whole-genome sequencing and discovering large-scale genome variations, and are usually removed.
Integrates workflow technology and in-built access to bioinformatics resources including remote data warehouses and tools. Galaxy permits users without programming skills to conduct computational analysis through the Web. It builds a succession of tools to perform multistep studies and is able to conserve the complete provenance of each analysis step. This platform offers drag and drop functionalities to ease the construction of workflows.
Allows users to interact with high-throughput sequencing data. SAMtools permits the manipulation of alignments in the SAM/BAM/CRAM formats: reading, writing, editing, indexing, viewing and converting SAM/BAM/CRAM format. It limits the mapping quality of reads with excessive mismatches and applies base alignment quality to fix alignment errors. This tool can sort and merge alignments, remove polymerase chain reaction (PCR) duplicates or generate per-position information.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
Enables users to check and prepare their datasets prior to downstream analysis. PRINSEQ provides graphical guidance and performs filtering, reformatting and trimming on FASTA (and QUAL) or FASTQ files. The web interface provides predefined option sets to perform different types of preprocessing, and the standalone version allows offline analysis and integration into existing data processing pipelines.
A package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead extends Bioconductor with tools useful in the initial stages of short-read DNA sequence analysis. Main functionalities include data input, quality assessment, data transformation and access to downstream analysis opportunities. It is an important gateway to use of Bioconductor for processing high-throughput DNA sequence data. ShortRead data structures allow convenient manipulation of data, such as filtering reads based on sequence characteristics.
A software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. The BamTools C++ API/library has been successfully integrated into a variety of applications. It provides the BAM file support for several utilities in the BEDtools suite.
A system to provide a flexible and usable Web environment for defining and running bioinformatics analyses. It embeds simple yet powerful data management features that allow the user to reproduce analyses and to combine tools using a hierarchical typing system. Mobyle offers invocation of services distributed over remote Mobyle servers, thus enabling a federated network of curated bioinformatics portals without the user having to learn complex concepts or to install sophisticated software.