Libraries/Frameworks | High-throughput sequencing data analysis
High throughput sequencing (HTS) has become one of the primary experimental tools used to extract genomic information from biological samples. Bioinformatics tools are continuously being developed for the analysis of HTS data. Beyond some well-defined core analyses, such as quality control or genomic alignment, the consistent development of custom tools and the representation of sequencing data in organized computational structures and entities remains a challenging effort for bioinformaticians.
Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
Provides a set of tools for biological computation written in Python. Biopython contains modules for reading and writing different sequence files formats and multiple sequence alignments, interacting with common tools (such as BLAST, ClustalW and EMBOSS), accessing key online databases, handling with 3D macro molecular structures and furnishing numerical methods for statistical learning. The main goal of this platform is to facilitate the use of Python for bioinformatics by generating reusable modules and classes.
Manages and manipulates life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. It is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists. It provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project.
Enables users to collect, manage and share various types of bioinformatics resources. BioInstaller is a program useful for performing interactive and reproducible data analyses. This tool aims to reduce the difficulty of constructing the interactive and reproducible biological data analysis applications for R users.
Provides a set of flexible genomic pipelines for processing and reporting Next Generation Sequencing (NGS) analysis. Sequana contains also some standalone applications: 1) sequana_coverage eases the extraction of genomic regions of interest and genome coverage information, 2) sequana_taxonomy performs a quick taxonomy of your FastQ and 3) Sequanix for Snakemake workflows. This requires dedicated databases to be downloaded.
Enables users to work with high-throughput sequencing data. HTSeq is a program that simplifies development of scripts for processing and analyzing high-throughput sequencing (HTS) data. It contains parsers for common file formats for a variety of types of input data and is suitable as a general platform for a diverse range of tasks.
Generates specific biological hypotheses by directing the supervised analyses of global microarray expression collections. PILGRM is a platform for interactive learning by genomics results mining. It brings sophisticated machine learning methods applied to enormous gene expression compendia into the lab of any researcher, enabling data driven experiment direction complementary to traditional knowledge-based discovery provided by existing databases.
Integrated pipelines for RNA-Seq analysis using CRAC (mapping tool) additional fields. CracTools are a complete toolbox designed to build pipelines on top of CRAC. CracTools are based on the “CracTools-core”, the key modules allowing to: (i) Extract CRAC information using specific data structures, (ii) Extract BAM lines, BED lines and such using dedicated intervaltree structures, (iii) Provide simply tools to extract CRAC features, combine different files, and count reads inside a region.
Comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. SeqAn is a library of data types and algorithms for sequence analysis in computational biology. Moroever, this tool applies a generic design that guarantees generality, extensibility, and integration with other libraries.
Contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser.
Aims to ease high-throughput sequencing (HTS) data analysis by the using of distributed computation. Eoulsan is a framework able to perform its tasks on distributed computers. The application includes batch analyses, a full automation process managing external file locations and distributed file system. It can be run according three modes: standalone, local cluster or cloud computing on Amazon Elastic MapReduce.
Provides a platform dedicated to storage and analysis of large binary numerical datasets using Hadoop and Spark. Biospark supplies abstractions for parallel analysis of standard data types such as multidimensional arrays and images. The application also includes modules of file conversion to ease the parallel analysis of specific datasets, including molecular dynamics simulations or time-lapse microscopy. In addition, it contains reference implementations of several commonly-used analysis.
Allows definition and execution of bioinformatics pipelines. Bpipe was created in response to a need to frequently run many variations of a pipeline with stages deleted, inserted, reordered or adjusted. The software is implemented in, a language that supports creation of Domain-Specific Languages for the Java Virtual Machine, and it does not require knowledge of either language to implement pipelines. It includes features such as automatic connection of stages, audit trail or transactional management of tasks.
Allows to manipulate and mine very large biological data collections for computational functional genomics. Sleipnir can perform common tasks: microarray processing, Bayesian and support vector machine (SVM) learning. The tool enables computational biologists to efficiently integrate thousands of genomic datasets and to rapidly mine them for biological knowledge. It can be useful for large integration tasks involving hundreds of diverse biological datasets.
Aims to facilitate the analysis of genome scale data from several standard file formats. CGAT permits users to filter, compare, convert, summarize and annotate genomic intervals, gene sets and sequences. The software comprises more than 50 tagged tools, each with documentation and examples. The tags associate tools with broad themes (genomic intervals, gene sets, sequences), standard genomic file formats and the type of computation performed by the tool, such as statistical summary, format conversion, annotation, comparison or filtering.
A computational framework that automates multi-omics data analysis pipelines on high performance compute clusters and in the cloud. It supports best practice published pipelines for RNA-seq, miRNA-seq, Exome-seq, Whole Genome sequencing, ChIP-seq analyses and automatic processing of data from The Cancer Genome Atlas (TCGA). Omics Pipe provides researchers with a tool for reproducible, open source and extensible next generation sequencing analysis. The goal of Omics Pipe is to democratize NGS analysis by dramatically increasing the accessibility and reproducibility of best practice computational pipelines, which will enable researchers to generate biologically meaningful and interpretable results.
Allows to analyze genomic data using Apache Pig and Hadoop. BioPig is built on MapReduce and Hadoop and thus has both the scalability and robustness offered by Apache Hadoop, and the programmability and parallel data flow control offered by Pig. It also has the flexibility to be embedded into other languages to achieve the types of control flows such as loops and branches that are not currently available in the Pig language.
The web portal allows end users to (i) execute and manage otherwise complex command-line programs, (ii) launch multiple exploratory analyses of parameter-rich and computationally intensive methods and (iii) track the sequence of steps and parameters that were used to perform a specific analysis.
Builds pipelines for next-generation data processing. NGSANE offers a framework allowing the analysis of data from different experimental protocols. It permits end users and developers to elaborate pipelines from call statements that can be tested on the command line directly without syntax alterations or wrapper script involvement. This application can also be used through the Amazon Elastic Compute Cloud.