Unlock your biological data


Try: RNA sequencing CRISPR Genomic databases DESeq

Assembly scaffolding software tools | De novo genome sequencing data analysis

The de novo assembly of short-read sequencing data usually leads to a fragmented set of genomic sequences (contigs). Ordering and orientating such contigs (scaffolding) represents the first, nontrivial step towards genome finishing and usually requires extensive processing and manual editing of large blocks of sequence (Barton and Barton, 2012). The preferred approach to genome scaffolding is currently based on assembling the sequenced reads into contigs and then using paired-end information to join them into scaffolds. Source text: Bosi et al., 2015.

1 - 50 of 51 results
filter_list Filters
build Technology
healing Disease
settings_input_component Operating System
tv Interface
computer Computer Skill
copyright License
1 - 50 of 51 results
star_border star_border star_border star_border star_border
star star star star star
Allows integrative investigation of next generation sequencing (NGS) microbiology data. Orione supports the whole life cycle of microbiology research data from production and annotation to publication and sharing. It can be used for a variety of microbiological projects including bacteria resequencing, de novo assembling and microbiome investigations. This tool is implemented on the Galaxy web platform.
Produces midrange scaffolding comparable to 40-kbp fosmid-based mate-pair libraries. fragScaff leverages coincidences between the content of different pools as a source of contiguity information. Specifically, contiguity preserving transposase sequencing (CPT-seq) data is mapped to a de novo genome assembly, followed by the identification of pairs of contigs or scaffolds whose ends disproportionately co-occur in the same indexed pools, consistent with true adjacency in the genome. Such candidate “joins” are used to construct a graph, which is then resolved by a minimum spanning tree.
Opera / Optimal Paired-End Read Assembler
A scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, with consistent improvement over state-of-the-art programs for scaffold correctness and contiguity. OPERA provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation (Illumina, Ion Torrent) and third-generation (PacBio, ONT) sequencing technologies. OPERA efficiently scaffolds large genomes with provable scaffold properties, providing an avenue for systematic augmentation and improvement of 1000s of existing draft eukaryotic genome assemblies.
star_border star_border star_border star_border star_border
star star star star star
A stand-alone program for scaffolding pre-assembled contigs using long reads (e.g. PacBio RS reads). Using the long read information, contigs (or scaffolds) are placed in the right order and orientation in so-called super-scaffolds. The SSPACE-LongRead hybrid assembly approach has been tested on a number of bacterial genomes and in most cases results in less than 10 super-scaffolds (numbers based on draft assemblies constructed with one Illumina MiSeq paired-end and one PacBio RS C2 SMRT library, both at 100X coverage).
Based on the combination of direct link and paired link graphs to address above scaffolding obstacles. inGAP-sf employs direct link to provide extra routes and decreases the complexity of repetitive contigs enriched regions. The main advantage of inGAP-sf is that it introduces the direct link graph to cluster and link Killer Ig-Like Receptor (KIR) contigs and also the Statistic-based estimation model to screen out correct routes from numerous noise routes in repetitive regions.
MeDuSa / Multi-Draft based Scaffolder
An algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalises the scaffolding problem by means of a combinatorial optimisation formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries.
TGNet / Transcript to Genome Consistency Network
Generates tables for visualization in Cytoscape from FASTA sequence and scaffolding information files. TGNet contains scripts that evaluates genome scaffolding by aligning independently obtained transcriptome sequences to the genome and visually summarizing the alignments using the Cytoscape software. This approach allows systematic evaluation of genome assemblies’ quality, and thus could be useful for large-scale sequencing projects such as the i5K initiative.
A pipeline for creating multiple assemblies and a framework for analysing and comparing them. Rampart supports a variety of third-party tools for assembling, scaffolding and read error correction. After assembling contigs using different tools and parameters, it produces statistics and plots enabling the user to interpret, compare and visualise results. Rampart uses metrics based on this information to assign scores to each assembly, highlighting which set of assembled contigs should be considered for further scaffolding and enhancement. This methodology enables rampart to evaluate its final assembly in the absence of a reference.
GPM / Genome Puzzle Master
Enables the integration of additional genomic signposts to edit and build ‘new-gen-assemblies’ that result in high-quality ‘annotation-ready’ pseudomolecules. GPM has the flexibility to edit and visualize assembled data, not only by showing the order and orientation of contig sequences and sequence redundancies but also by displaying the necessary guide information for easy user confirmation. One thing GPM cannot do is to assemble a genome de novo from raw reads. The final quality of a GPM assembly relies heavily on the quality and richness of the guide information used.
A de novo scaffolder aimed at synthetic long read (SLR) technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR’s underlying short reads, which we refer to as read clouds. This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.
A greedy heuristic algorithm to solve the weighted MAX-DIR problem. As a heavier-weighted edge generally reflects greater confidence for the scaffolding which it represents, such an edge is likely to be valid, and therefore included in the optimal solution. Thus a greedy algorithm, which maximally favors heavier-weighted edges, approximates an ideal solution for the contig orientation problem. Thus this method, which we have implemented as a module of ScaffoldScaffolder, has the potential to assist in the automated discovery of biologically significant features in de novo genome assembly.
Scaffolds genome fragments using long trancriptome reads. L_RNA_scaffolder (1) seeks to find guide transcript exons, which are anchored to different genomic fragments, (2) finds an optimal connected fragment for every anchored fragment based on the number of transcripts aligned to them, and (3) builds scaffolding paths by walking the optimal connections. The software can handle the transcript reads generated from 454/Sanger/Ion_Torrent sequencing, or de novo assembled with pair-end Illumina sequencing.
ARKS / Assembly Roundup by linked-read Kmer mapping Scaffolder
Registers and associates the barcode data required to order and orient draft assembly sequences. ARKS can find all probable contig pairs within the draft genome assembly, along with their relative head/tail orientations. It maps Chromium linked barcodes to contigs, (2) scores candidate contig pairs, and (3) constructs the output scaffold graph with estimated distances. This tool employs barcode data to determine distances between neighboring contigs.
Predicts relative positions and orientations of the contigs, yielding a directed contig graph. SLIQ provides a set of simple linear inequalities derived from the geometry of contigs on the line. It produces a reduced subset of reliable mate pairs and thus a sparser graph which results in a simpler optimization problem for the scaffolding algorithm. The output of this scaffolder can either be used as draft scaffolds or as a reasonable starting point for refinement with more complex optimization procedures used in other scaffolders.
AGOUTI / Annotated Genome Optimization Using Transcriptome Information
Uses RNA-seq data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. AGOUTI is able to scaffold thousands of contigs while simultaneously reducing the number of gene models by several thousand. It therefore makes it easy to improve both genome assemblies and genome annotations. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output.
SWALO / Scaffolding With Assembly Likelihood Optimization
A package for scaffolding based on likelihood of genome assemblies computed using generative models for sequencing. SWALO learns parameters automatically from the data. It is also able to make use of multi-mapped read pairs through probabilistic disambiguation which most other sca olding tools ignore. The method is grounded in rigorous probabilistic models yet proper approximations make the implementation efficient and applicable to practical datasets. SWALO outperforms all other scaffolders on real and simulated datasets: for 100kb contigs, it is able to make 100% of the correct joins using either library and all aligners.
BATISCAF / BAd conTIg removal SCAFfolding
Aims to filter out repeated and short contigs. BATISCAF is a software that consists of three steps: (1) filtering out potential repeats via ILP; (2) constructing backbone scaffolding for potentially unique contigs; or (3) inserting multiple copies of potential repeats into backbone scaffolds. This tool offers a method for solving the problem of minimal length repeat and short contig removal after which the problem of scaffolding becomes trivial.
GAT / Genscale Assembly Tool
Allows scaffolding and gap filling phases in the case of circular genomes. GAT is an algorithm, based on a version of the longest path problem solved by mixed integer linear program (MILP) modeling, that works in cases of mate-pairs and pair-ends distances. The approach consists of developing a global optimization approach where the scaffolding, gap-filling, and scaffold extension steps are simultaneously solved in the framework of a common objective function. The algorithm was tested on a set of 33 chloroplast genome data.
LINKMFEX / LINKage analysis for outcrossed families with Male or Female EXchange of the mapping parent
Allows users to evaluate either male or female specific linkage arrangements without having to use special coding manipulations. LINKMFEX can be used with another program (MAPORD) to make maps orders from nearest neighbour linkage relationships or terminal marker orders. It can also be run by analysing only a single maker or a subset of markers with all other markers. Another program included in LINKMFEX (called MAPDIS) is able to build sex-specific linkage maps for the species.
0 - 0 of 0 results
1 - 6 of 6 results
filter_list Filters
computer Job seeker
Disable 2
person Position
thumb_up Fields of Interest
public Country
language Programming Language
1 - 6 of 6 results