Gives access to many free software tools for sequence analysis. EMBOSS aims to serve the molecular biology community. It permits the creation and the release of software in an open source spirit. This tool is useful for sequence analysis into a seamless whole. It is free of charge and is available in open source.
A software suite for the comparison, manipulation and annotation of genomic features in browser extensible data (BED) and general feature format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.
Reads bigWig files and allows users to extract and summarize data, globally or at specific regions. Bwtool consists of a collection of utilities designed to collect summary statistics and do common wiggle manipulations. The software provides several features such as aggregating data by averaging it over a series of given intervals with respect to central bases, finding regions in the bigWig exhibiting local minima/maxima or above/below specified thresholds, or removing data based on thresholds and specific regions if desired.
A package for collating and searching across thousands of next-generation sequence (NGS) samples. Vancouver Short Read Analysis provides a database can be installed easily to rapidly access and store genetic variation information, compare data from any sequencing platform and perform aggregate analyses. The schema of the database makes rapid and insightful queries simple and enables easy annotation of novel or known genetic variations. Filtering can be done by utilizing annotations, matched pair datasets or datasets marked as non-cancer for separating polymorphisms from putative variants.
Allows to manipulate, organize, summarize and visualize MinION nanopore sequencing data. poRe enables users to manipulate MinION FAST5 files into run folders, extract FASTQ, gather statistics on each run and plot a number of key graphs, such as read-length histograms and yield-over-time. Two graphical-user-interfaces (GUIs) for MinION data processing, organization and extraction are available through the package.
Handles multiple sequences and alignments in batch mode. FasParser provides a platform able to perform several common tasks such as: (i) batch performing alignment building; (ii) concatenating, merging, extracting and filtering of sequences, (iii) alignment format conversion; (iv) designing polymerase chain reaction (PCR) primers, and more. Additionally, the application supplies an editor dedicated to the visualization and the editing of the analyzed sequences.
Allows users to read, manipulate, and write CAF (Common Assembly Format) files. CAFTOOLS is composed of: (1) Perl-5 libraries that allows users to create wrappers for software; and (2) ANSI-C libraries that allows management of large data sets. This tool can be applied to sequence data from a mixture of cosmids and a whole-genome library. It can be useful for quality measures and allows users to extract information that they need without reading all data into a file.
Improves the design and use of polymerase chain reaction (PCR)-based methylation assays. methPrimer was developed to store and retrieve validated methylation assays. This resource is intended to be a search portal for validated methylation assays. It also aims to establish a certain level of standardization and uniformity in the use of PCR based methylation assays. Each primer set is provided with a unique identifier to access them directly or refer to in a publication.
Handles variants according to the Human Genome Variation Society (HGVS) specification. HGVS is an open-source library consisting of five main modules dedicated to: (i) parsing; (ii) validation of descriptions; projecting variants between genomic and transcript level; (iv) querying external database and (v) formatting. It aims to assist users in standardizing data as well as interpreting variants of their patients.
Enables reading of sequencing files from the SRA database and writing files into the same format. The NCBI SRA Toolkit is provided in the form of the SRA SDK, and can be compiled with GCC. It allows users to programmatically access data housed within SRA and convert it from the SRA format: ABI SOLiD native, fasta, fastq, sff, sam, Illumina native. This method is available for all commons platforms.
A statistical framework for calling SNPs, discovering somatic mutations, inferring population genetical parameters and performing association tests directly based on sequencing data. BCFtools can manipulate variant calls in the variant call format (VCF) and its binary counterpart BCF. It also can discover somatic and germline mutations with appropriate input data, efficiently estimate site allele frequency, allele frequency spectrum and linkage disequilibrium, and test Hardy–Weinberg equilibrium and association.
Assists users in manipulating high-throughput sequencing (HTS) data and formats. Picard is a Java toolkit that provides a set of command line scripts. It comprises Java-based utilities that manipulate SAM files, and a Java API for creating new programs that reads and writes SAM files. Both SAM text format and SAM binary (BAM) format are supported. It also works with next generation sequencing (NGS).
Allows to manage FASTA and FASTQ files. FASTdoop is based on a wide range of experiments. It supports FASTA files containing one or more short sequences or a single very large sequence of arbitrary length. It allows to parse the content of FASTQ files containing short sequences. The tool can efficiently handle FASTA files that are not adequately supported by that library. The routines in FASTdoop represent an advancement of the state of the art both in terms of versatility and efficiency.
Helps users to manipulate and visualize VCF data. vcfr provides ready access to the multitude of statistical and graphical tools provided by the R environment. It contains functions to parse VCF data once loaded into memory. The tool can extract numeric values such as read depth or genotype qualities from this data. It generates VCF data and allows to query them for quality metrics to determine quality filtering thresholds.
Provides an R interface to the “SAMtools” C-Library. rbamtools consists of three layers: the samtools C library, C based containers for alignments and alignment gaps as well as an S4 class library in R. The software offers functions for creation and modification of BAM file header or alignment section contents. It thus enables users to investigate alignment results by reading the header section or retrieve and view alignments from regions of interest using basic R structures.
Permits quality control of Next-Generation-Sequencing (NGS) tumor-normal experiments. NGS-Bits is separate into four steps: (1) gather information from raw reads, (2) map reads, (3) extract variant lists, and (4) combine result from precedent steps to then add quality control (QC) metrics for tumor-normal experiments. This tool includes all stages of single-sample NGS data analysis and adds special QC metrics for DNA sequencing of tumor-normal pairs.
Offers an assortment of tools suited for sequence analysis. Japsa is an open source package that gathers more than 20 tools including a java library and an API. The application provides a wide range of functionalities that allows users to split multiple sequences files, to perform real-time identification of antibiotic resistance gene with Oxford Nanopore sequencing as well as to normalize the branch length of a phylogeny.
Permits to create, parse, edit and write graphical fragment assembly (GFA) files. GfaPy provides a complete interface to parse files in GFA, a format allowing for the description of generic sequence graphs. It can be modified to incorporate future changes in the GFA specification. It offers syntactic and semantic validation to guarantee valid GFA data. It can be used for GFA 1 and GFA 2 files.
Facilitates the design, optimization, and tracking of barcoded oligonucleotides. XSTK is useful for projects that require highly multiplexed polymerase chain reaction (PCR) and DNA sequencing. It builds a list of all possible DNA sequences of a specified length and then progressively culls sequences that may interfere with primary PCR amplification and/or sequencing steps.
Improves efficiency and interactivity of tools. BAMQL is a domain-specific language for matching BAM reads. It provides a large collection of predicates. These predicates can be joined together with a wide set of logical connectives, allowing for expressive queries. Queries can be compiled to native machine code, making it trivial to integrate into a larger C or C++ program.
Allows to parse, edit and write Graphical Fragment Assembly (GFA) files, complying with the proposed standard. RGFA permits simple graph manipulation, limited to operations which do not make any assumption on the graph content and do not define any custom fields. The graphs outputted by the tool can be converted into an RGL (Ruby Graph Library) graph object. It provides a way to create manipulation pipelines, which can then be applied to several graphs or their connected components in a unified way without manual interference.
Returns a reformatted file if the input file violates the user defined format requirements. Fasta-O-Matic can be employed as a general pre-processing tool in bioinformatics workflows. It consists of a quality control script useful for a variety of downstream bioinformatics tools. This tool represents a sanity check for bioinformatic core facilities that tends to repeat common analysis steps on FASTA files received from disparate sources.
Simplifies downstream utilization of high-throughput sequencing (HTS) data. TBtools is designed to work with next generation sequencing (NGS) data for web-lab biology. This software suits for wet-lab biologists who are inexperienced in programming or command-line environments and seek to save time from daily sequence analysis work.
Uses gene models for performing the calculation or the extraction of gene modes. GTFtools is a standalone software including a set of features for bioinformatics processes such as: (i) calculation of merged exons or independent introns; (ii) coordination of isoform or gene and; (iii) determination of gene including mean, median and max lengths of isoforms as well as the length of merged exons of isoforms.
Intends to make FASTA and GFF data more accessible and useful for users. GiraFFe Browse allows users to import sequence and annotation data and in the case of GFF3 files, adds extra fields like the corresponding nucleotide sequence extracted from the coordinates found in the file. It also offers reverse complement and protein coding translations. This software is available through a standalone version and as a web-application demo for data of Escherichia coli.
Provides several programs allowing users to perform both common and uncommon tasks with FASTQ files. fastq-tools is a toolkit that provides tools for (1) finding reads matching a regular-expression, (2) counting k-mer occurances, (3) performing local alignment against every FASTQ sequence, (4) sample reads with or without replacement, (5) sorting FASTQ files and (6) filtering reads with identical sequences.
Removes a section from a sequence. cutseq is an Emboss tool that and a simple editing program allowing to cut out a region from a sequence by specifying the begin and end positions of the region to remove. It removes the sequence from the specified start to the end positions (inclusive) and writes the remaining sequence to the output file. Users can delete a complete sequence and write out an empty sequence file.
Gathers functionalities to parse and emit data especially for Python. PyYAML is a parser for YAML, a data serialization format designed for human readability and interaction with scripting languages. It includes high-level API for serializing and deserializing native Python objects and is relatively sensible error messages. This method was developed to provides a reference for YAML parser implementors.
Provides an application for BAM files management and exploitation of the information which is related with. BamM is a standalone software providing three main features for: (i) generating linking information or coverage profiles; (ii) isolating reads mapping to specific collections of contigs or; (iii) mapping multiple sets against a common reference sequence with the possibility to set multiple optional arguments.
Permits to grab arbitrary lines from a BGZIP compressed file. Grabix provide random access into text files that have been compressed with bgzip. This tool creates its own index of the file, then users can extract arbitrary lines from the file with the grab command or can choose random lines with the random command.
Provides a collection of tools for extracting and manipulating data from HDF5 files produced by Pacific Biosciences. pbh5tools contains two different executables (cmph5tools and bash5tools) that offers to users mechanisms to extract basecall information from bas.h5 files. The program can also obtain read sequences and quality values for raw and circular consensus sequencing (CCS) readtypes.
Writes to file a subset of an input stream of sequences. notseq is an Emboss tool. The list of sequence names or accession numbers to exclude from output is provided as a string. Optionally, the excluded sequences are also written to file. Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl. Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application.
Writes a fast and well-tested set of functions that more specialized bioinformatics programs can use. Needletail is a minimal-copying FASTA/FASTQ parser and k-mer processing library for Rust. The goal of this package is to be as fast as the readfq C library at parsing FASTX files and much faster than equivalent Python implementations at k-mer counting. Needletail requires rust and cargo to be installed.
Allows users to work with nanopore data. fast5_scripts is composed of fast5Watcher and nanoStats. The first permits users to extract data from MinKNOW base-called fast5 files, and the second assists users in reading of length and quality statistics from fastq file. Moreover, nanoStats can export a histogram concerning statistics.
Provides an utility dedicated to the manipulation of tar files. tarbomb Disposal allows users to defuse tarbombs and to mine a tar archive. It aims to assists researchers in extracting files into a defined working directory and offers four options enabling to set user’s requirements about modification time, the owner, file permissions and the ability to generate a verbose output.
Provides a simple General Feature Format (GFF) and General Transfer Format (GTF) parsing resource. LibGFF is a library that provides GTF/GFF parsers exposing a C++ interface without the necessity of drawing in a heavy-weight dependency like SeqAn.
Enables the extraction of specific genome annotation features around reference genomic anchor points. ChIP-Extract collects ChIP-seq tags that are distributed within a given distance from reference anchor points. The software output can be used for generating heatmaps.
Provides read-only sequence utilities. Screed is a method that provides short read sequence in python. It can parse FASTA and FASTQ files, generate databases, and let user query these databases. Values such as sequence name, sequence description, sequence quality, and the sequence itself can be retrieved from these databases. This method can also convert FASTA files to FASTQ and back again.
An extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and a command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.
Concatenates different nucleotide, amino acid and structure sequence fragments of same taxa to one super matrix file in format which can be used for phylogenetic purposes. FASconCAT extracts taxon specific associated gene- or structure sequences out of given input files and links them to one string. Missing taxon sequences in single files are replaced either by 'N', 'X' or by 'dots', dependent on their taxon associated data level (nucleotide, amino acid or "dot-bracket" structures).
Offers an utility for handling genomic datasets. Pysam is a package which proposes facilities to read, manage and modify more than eight formats with a focus on SAM and BAM files, because of their capacity to store large numbers of alignments. It allows users to create files from scratch as well as options to work with VCF files and tabix-indexed files and can handle random access through indexing.
Allows to parse the annotation information of eukaryotic genome and then generate statistical description of six indices for visualization. GFFview Was developed in order to investigate quality and difference of the de novo assembled transcriptome in RNA-seq studies. It provides an overall view for describing annotation information of genome. The tool accepts the compressed format with suffix of ‘‘.tar.gz’’ to facilitate data uploading.
Parses BLAST and FASTA reports. Bioparser is a program that employs the BioPerl toolkit and allows filtering and storage of the reports. The software can also feed a local database with the parsed information. It can be used for large-scale similarity analyses and for the extraction of information of large sets of sequence alignments.
0 - 0 of 0
1 - 18 of 18